pandas 熊猫寻找局部最大值和最小值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48023982/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:59:02  来源:igfitidea点击:

Pandas finding local max and min

pythonpandasnumpydataframe

提问by Mustard Tiger

I have a pandas data frame with two columns one is temperature the other is time.

我有一个包含两列的Pandas数据框,一列是温度,另一列是时间。

I would like to make third and fourth columns called min and max. Each of these columns would be filled with nan's except where there is a local min or max, then it would have the value of that extrema.

我想制作名为 min 和 max 的第三和第四列。这些列中的每一列都将填充 nan,除非有局部最小值或最大值,然后它将具有该极值的值。

Here is a sample of what the data looks like, essentially I am trying to identify all the peaks and low points in the figure.

这是数据外观的示例,基本上我试图识别图中的所有峰值和低点。

enter image description here

在此处输入图片说明

Are there any built in tools with pandas that can accomplish this?

是否有任何内置的 Pandas 工具可以实现这一点?

回答by Foad

The solution offered by fugledeis great but if your data is very noisy (like the one in the picture) you will end up with lots misleading local exterims. I suggest that you use scipy.signal.argrelextremafunction. argrelextremahas its own limitations but it has a cool feature where you can specify the number of points to be compared, kind of like a noise filtering algorithm. for example:

fuglede 提供的解决方案很棒,但是如果您的数据非常嘈杂(如图中的那个),您最终会得到很多误导性的本地外部数据。我建议你使用scipy.signal.argrelextrema函数。argrelextrema有其自身的局限性,但它有一个很酷的功能,您可以指定要比较的点数,有点像噪声过滤算法。例如:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.signal import argrelextrema

# Generate a noisy AR(1) sample
np.random.seed(0)
rs = np.random.randn(200)
xs = [0]
for r in rs:
    xs.append(xs[-1]*0.9 + r)
df = pd.DataFrame(xs, columns=['data'])

n=5 # number of points to be checked before and after 
# Find local peaks
df['min'] = df.iloc[argrelextrema(df.data.values, np.less_equal, order=n)[0]]['data']
df['max'] = df.iloc[argrelextrema(df.data.values, np.greater_equal, order=n)[0]]['data']

# Plot results
plt.scatter(df.index, df['min'], c='r')
plt.scatter(df.index, df['max'], c='g')
plt.plot(df.index, df['data'])
plt.show()

enter image description here

在此处输入图片说明

Some points:

几点:

  • you might need to check the points afterwards to be sure there no points very close to each other.
  • you can play with nto filter the noisy points
  • argrelextremareturns a tuple and the [0]at the end extracts a numpyarray
  • 您可能需要在事后检查点以确保没有点彼此非常接近。
  • 你可以玩n来过滤嘈杂的点
  • argrelextrema返回一个元组,[0]最后提取一个numpy数组

回答by fuglede

Assuming that the column of interest is labelled data, one solution would be

假设感兴趣的列被标记data,一种解决方案是

df['min'] = df.data[(df.data.shift(1) > df.data) & (df.data.shift(-1) > df.data)]
df['max'] = df.data[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)]

For example:

例如:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Generate a noisy AR(1) sample
np.random.seed(0)
rs = np.random.randn(200)
xs = [0]
for r in rs:
    xs.append(xs[-1]*0.9 + r)
df = pd.DataFrame(xs, columns=['data'])

# Find local peaks
df['min'] = df.data[(df.data.shift(1) > df.data) & (df.data.shift(-1) > df.data)]
df['max'] = df.data[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)]

# Plot results
plt.scatter(df.index, df['min'], c='r')
plt.scatter(df.index, df['max'], c='g')
df.data.plot()

enter image description here

在此处输入图片说明

回答by faizanur Rahman

using Numpy

使用 Numpy

ser = np.random.randint(-40, 40, 100) # 100 points
peak = np.where(np.diff(ser) < 0)[0]

or

或者

double_difference = np.diff(np.sign(np.diff(ser)))
peak = np.where(double_difference == -2)[0]


using Pandas

使用Pandas

ser = pd.Series(np.random.randint(2, 5, 100))
peak_df = ser[(ser.shift(1) < ser) & (ser.shift(-1) < ser)]
peak = peak_df.index