pandas 添加一列,这是熊猫连续行差异的结果

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23142967/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 00:13:37  来源:igfitidea点击:

Adding a column thats result of difference in consecutive rows in pandas

pandasdataframeseries

提问by AMM

Lets say I have a dataframe like this

假设我有一个这样的数据框

    A   B
0   a   b
1   c   d
2   e   f 
3   g   h

0,1,2,3 are times, a, c, e, g is one time series and b, d, f, h is another time series. I need to be able to add two columns to the orignal dataframe which is got by computing the differences of consecutive rows for certain columns.

0、1、2、3是时间,a、c、e、g是一个时间序列,b、d、f、h是另一个时间序列。我需要能够向原始数据帧添加两列,这是通过计算某些列的连续行的差异而获得的。

So i need something like this

所以我需要这样的东西

    A   B   dA
0   a   b  (a-c)
1   c   d  (c-e)
2   e   f  (e-g)
3   g   h   Nan

I saw something called diff on the dataframe/series but that does it slightly differently as in first element will become Nan.

我在数据帧/系列上看到了一个叫做 diff 的东西,但它的作用略有不同,因为在第一个元素中会变成 Nan。

回答by exp1orer

Use shift.

使用移位

df['dA'] = df['A'] - df['A'].shift(-1)

回答by DSM

You could use diffand pass -1as the periodsargument:

您可以使用diff-1作为periods参数传递:

>>> df = pd.DataFrame({"A": [9, 4, 2, 1], "B": [12, 7, 5, 4]})
>>> df["dA"] = df["A"].diff(-1)
>>> df
   A   B  dA
0  9  12   5
1  4   7   2
2  2   5   1
3  1   4 NaN

[4 rows x 3 columns]

回答by Seth Okeyo

When using data in CSV, this would work perfectly:

在 CSV 中使用数据时,这将完美地工作:

my_data = pd.read_csv('sale_data.csv')
df = pd.DataFrame(my_data)
df['New_column'] = df['target_column'].diff(1)
print(df) #for the console but not necessary 

回答by Seth Okeyo

Rolling differences can also be calculated this way:

滚动差异也可以这样计算:

df=pd.DataFrame(my_data)
my_data = pd.read_csv('sales_data.csv')
i=0
j=1
while j < len(df['Target_column']):
    j=df['Target_column'][i+1] - df['Target_column'][i] #the difference btwn two values in a column.
    i+=1 #move to the next value in the column.
    j+=1 #next value in the new column.
    print(j)