pandas 熊猫寻找局部最大值和最小值

Question

提问by Mustard Tiger

I have a pandas data frame with two columns one is temperature the other is time.

我有一个包含两列的Pandas数据框，一列是温度，另一列是时间。

I would like to make third and fourth columns called min and max. Each of these columns would be filled with nan's except where there is a local min or max, then it would have the value of that extrema.

我想制作名为 min 和 max 的第三和第四列。这些列中的每一列都将填充 nan，除非有局部最小值或最大值，然后它将具有该极值的值。

Here is a sample of what the data looks like, essentially I am trying to identify all the peaks and low points in the figure.

这是数据外观的示例，基本上我试图识别图中的所有峰值和低点。

Are there any built in tools with pandas that can accomplish this?

是否有任何内置的 Pandas 工具可以实现这一点？

Answer 1

回答by Foad

The solution offered by fugledeis great but if your data is very noisy (like the one in the picture) you will end up with lots misleading local exterims. I suggest that you use scipy.signal.argrelextremafunction. argrelextremahas its own limitations but it has a cool feature where you can specify the number of points to be compared, kind of like a noise filtering algorithm. for example:

fuglede 提供的解决方案很棒，但是如果您的数据非常嘈杂（如图中的那个），您最终会得到很多误导性的本地外部数据。我建议你使用scipy.signal.argrelextrema函数。argrelextrema有其自身的局限性，但它有一个很酷的功能，您可以指定要比较的点数，有点像噪声过滤算法。例如：

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.signal import argrelextrema

# Generate a noisy AR(1) sample
np.random.seed(0)
rs = np.random.randn(200)
xs = [0]
for r in rs:
    xs.append(xs[-1]*0.9 + r)
df = pd.DataFrame(xs, columns=['data'])

n=5 # number of points to be checked before and after 
# Find local peaks
df['min'] = df.iloc[argrelextrema(df.data.values, np.less_equal, order=n)[0]]['data']
df['max'] = df.iloc[argrelextrema(df.data.values, np.greater_equal, order=n)[0]]['data']

# Plot results
plt.scatter(df.index, df['min'], c='r')
plt.scatter(df.index, df['max'], c='g')
plt.plot(df.index, df['data'])
plt.show()

Some points:

几点：

you might need to check the points afterwards to be sure there no points very close to each other.
you can play with nto filter the noisy points
argrelextremareturns a tuple and the [0]at the end extracts a numpyarray

您可能需要在事后检查点以确保没有点彼此非常接近。
你可以玩n来过滤嘈杂的点
argrelextrema返回一个元组，[0]最后提取一个numpy数组

Answer 2

回答by fuglede

Assuming that the column of interest is labelled data, one solution would be

假设感兴趣的列被标记data，一种解决方案是

df['min'] = df.data[(df.data.shift(1) > df.data) & (df.data.shift(-1) > df.data)]
df['max'] = df.data[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)]

For example:

例如：

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Generate a noisy AR(1) sample
np.random.seed(0)
rs = np.random.randn(200)
xs = [0]
for r in rs:
    xs.append(xs[-1]*0.9 + r)
df = pd.DataFrame(xs, columns=['data'])

# Find local peaks
df['min'] = df.data[(df.data.shift(1) > df.data) & (df.data.shift(-1) > df.data)]
df['max'] = df.data[(df.data.shift(1) < df.data) & (df.data.shift(-1) < df.data)]

# Plot results
plt.scatter(df.index, df['min'], c='r')
plt.scatter(df.index, df['max'], c='g')
df.data.plot()

Answer 3

回答by faizanur Rahman

using Numpy

使用 Numpy

ser = np.random.randint(-40, 40, 100) # 100 points
peak = np.where(np.diff(ser) < 0)[0]

or

或者

double_difference = np.diff(np.sign(np.diff(ser)))
peak = np.where(double_difference == -2)[0]

using Pandas

使用Pandas

ser = pd.Series(np.random.randint(2, 5, 100))
peak_df = ser[(ser.shift(1) < ser) & (ser.shift(-1) < ser)]
peak = peak_df.index

pandas 熊猫寻找局部最大值和最小值

提问by Mustard Tiger

回答by Foad

回答by fuglede

回答by faizanur Rahman

相关推荐

最近更新

标签

pandas 熊猫寻找局部最大值和最小值

提问by Mustard Tiger

回答by Foad

回答by fuglede

回答by faizanur Rahman

相关推荐

Pandas sort_values 不能正确排序数字

pandas Seaborn 热图：将颜色条移动到图的顶部

从 Pandas DataFrame 行获取单元格值

pandas 使用pandas读取csv文件时如何选择多行？

相关推荐

最近更新

标签