pandas 熊猫寻找局部最大值和最小值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48023982/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas finding local max and min
提问by Mustard Tiger
I have a pandas data frame with two columns one is temperature the other is time.
我有一个包含两列的Pandas数据框,一列是温度,另一列是时间。
I would like to make third and fourth columns called min and max. Each of these columns would be filled with nan's except where there is a local min or max, then it would have the value of that extrema.
我想制作名为 min 和 max 的第三和第四列。这些列中的每一列都将填充 nan,除非有局部最小值或最大值,然后它将具有该极值的值。
Here is a sample of what the data looks like, essentially I am trying to identify all the peaks and low points in the figure.
这是数据外观的示例,基本上我试图识别图中的所有峰值和低点。
Are there any built in tools with pandas that can accomplish this?
是否有任何内置的 Pandas 工具可以实现这一点?
回答by Foad
The solution offered by fugledeis great but if your data is very noisy (like the one in the picture) you will end up with lots misleading local exterims. I suggest that you use scipy.signal.argrelextrema
function. argrelextrema
has its own limitations but it has a cool feature where you can specify the number of points to be compared, kind of like a noise filtering algorithm. for example:
fuglede 提供的解决方案很棒,但是如果您的数据非常嘈杂(如图中的那个),您最终会得到很多误导性的本地外部数据。我建议你使用scipy.signal.argrelextrema
函数。argrelextrema
有其自身的局限性,但它有一个很酷的功能,您可以指定要比较的点数,有点像噪声过滤算法。例如:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.signal import argrelextrema
# Generate a noisy AR(1) sample
np.random.seed(0)
rs = np.random.randn(200)
xs = [0]
for r in rs:
xs.append(xs[-1]*0.9 + r)
df = pd.DataFrame(xs, columns=['data'])
n=5 # number of points to be checked before and after
# Find local peaks
df['min'] = df.iloc[argrelextrema(df.data.values, np.less_equal, order=n)[0]]['data']
df['max'] = df.iloc[argrelextrema(df.data.values, np.greater_equal, order=n)[0]]['data']
# Plot results
plt.scatter(df.index, df['min'], c='r')
plt.scatter(df.index, df['max'], c='g')
plt.plot(df.index, df['data'])
plt.show()
Some points:
几点:
- you might need to check the points afterwards to be sure there no points very close to each other.
- you can play with
n
to filter the noisy points argrelextrema
returns a tuple and the[0]
at the end extracts anumpy
array
- 您可能需要在事后检查点以确保没有点彼此非常接近。
- 你可以玩
n
来过滤嘈杂的点 argrelextrema
返回一个元组,[0]
最后提取一个numpy
数组
回答by fuglede
Assuming that the column of interest is labelled data
, one solution would be
假设感兴趣的列被标记data
,一种解决方案是
df['min'] = df.data[(df.data.shift(1) > df.data) & (df.data.shift(-1) > df.data)]
df['max'] = df.data[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)]
For example:
例如:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Generate a noisy AR(1) sample
np.random.seed(0)
rs = np.random.randn(200)
xs = [0]
for r in rs:
xs.append(xs[-1]*0.9 + r)
df = pd.DataFrame(xs, columns=['data'])
# Find local peaks
df['min'] = df.data[(df.data.shift(1) > df.data) & (df.data.shift(-1) > df.data)]
df['max'] = df.data[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)]
# Plot results
plt.scatter(df.index, df['min'], c='r')
plt.scatter(df.index, df['max'], c='g')
df.data.plot()
回答by faizanur Rahman
using Numpy
使用 Numpy
ser = np.random.randint(-40, 40, 100) # 100 points
peak = np.where(np.diff(ser) < 0)[0]
or
或者
double_difference = np.diff(np.sign(np.diff(ser)))
peak = np.where(double_difference == -2)[0]
using Pandas
使用Pandas
ser = pd.Series(np.random.randint(2, 5, 100))
peak_df = ser[(ser.shift(1) < ser) & (ser.shift(-1) < ser)]
peak = peak_df.index