pandas 熊猫的滚动差异

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48518338/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:06:25  来源:igfitidea点击:

Rolling difference in Pandas

pythonpandas

提问by WBM

Does anyone know an efficient function/method such as pandas.rolling_mean, that would calculate the rolling difference of an array

有谁知道一个有效的函数/方法,例如pandas.rolling_mean,可以计算数组的滚动差异

This is my closest solution:

这是我最接近的解决方案:

roll_diff = pd.Series(values).diff(periods=1)

However, it only calculates single-step rolling difference. Ideally the step size would be editable (i.e. difference between current time step and n last steps).

但是,它只计算单步滚动差异。理想情况下,步长是可编辑的(即当前时间步长和最后 n 步之间的差异)。

I've also written this, but for larger arrays, it is quite slow:

我也写过这个,但是对于较大的数组,它很慢:

def roll_diff(values,step):
    diff = []
    for i in np.arange(step, len(values)-1):
        pers_window = np.arange(i-1,i-step-1,-1)
        diff.append(np.abs(values[i] - np.mean(values[pers_window])))
    diff = np.pad(diff, (0, step+1), 'constant')
    return diff

回答by Pierluigi

What about:

关于什么:

import pandas

x = pandas.DataFrame({
    'x_1': [0, 1, 2, 3, 0, 1, 2, 500, ],},
    index=[0, 1, 2, 3, 4, 5, 6, 7])

x['x_1'].rolling(window=2).apply(lambda x: x.iloc[1] - x.iloc[0])

in general you can replace the lambdafunction with your own function. Note that in this case the first item will be NaN.

一般来说,您可以lambda用您自己的函数替换该函数。请注意,在这种情况下,第一项将是NaN

Update

更新

Defining the following:

定义以下内容:

n_steps = 2
def my_fun(x):
    return x.iloc[-1] - x.iloc[0]

x['x_1'].rolling(window=n_steps).apply(my_fun)

you can compute the differences between values at n_steps.

您可以计算 处的值之间的差异n_steps

回答by Dan

You can do the same thing as in https://stackoverflow.com/a/48345749/1011724if you work directly on the underlying numpy array:

如果您直接在底层 numpy 数组上工作,您可以执行与https://stackoverflow.com/a/48345749/1011724相同的操作:

import numpy as np
diff_kernel = np.array([1,-1])
np.convolve(rs,diff_kernel ,'same')

where rsis your pandas series

rs你的Pandas系列在哪里

回答by Manualmsdos

If you got KeyError: 0, try with iloc:

如果有KeyError: 0,请尝试iloc

import pandas

x = pandas.DataFrame({
    'x_1': [0, 1, 2, 3, 0, 1, 2, 500, ],},
    index=[0, 1, 2, 3, 4, 5, 6, 7])

x['x_1'].rolling(window=2).apply(lambda x: x.iloc[1] - x.iloc[0])

回答by jpp

This should work:

这应该有效:

import numpy as np

x = np.array([1, 3, 6, 1, -5, 6, 4, 1, 6])

def running_diff(arr, N):
    return np.array([arr[i] - arr[i-N] for i in range(N, len(arr))])

running_diff(x, 4)  # array([-6,  3, -2,  0, 11])

For a given pd.Series, you will have to define what you want for the first few items. The below example just returns the initial series values.

对于给定的pd.Series,您必须为前几项定义您想要的内容。下面的示例仅返回初始系列值。

s_roll_diff = np.hstack((s.values[:4], running_diff(s.values, 4)))

This works because you can assign a np.arraydirectly to a pd.DataFrame, e.g. for a column s, df.s_roll_diff = np.hstack((df.s.values[:4], running_diff(df.s.values, 4)))

这是有效的,因为您可以将 anp.array直接分配给 a pd.DataFrame,例如对于列sdf.s_roll_diff = np.hstack((df.s.values[:4], running_diff(df.s.values, 4)))