pandas diff() 为一阶差分给出 0 值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42748566/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:09:48  来源:igfitidea点击:

pandas diff() giving 0 value for first difference

pythonpandasnumpydataframe

提问by warrenfitzhenry

I have df:

我有 df:

Hour  Energy Wh  
1        4          
2        6           
3        9
4        15

I would like to add a column that shows the per hour difference. I am using this:

我想添加一个显示每小时差异的列。我正在使用这个:

df['Energy Wh/h'] = df['Energy Wh'].diff().fillna(0)

df1:

df1:

Hour  Energy Wh  Energy Wh/h
1        4          0
2        6          2 
3        9          3
4        15         6

However, the Hour 1 value is showing up as 0 in the Energy Wh/h column, whereas I would like it to show up as 4, like below:

但是,Hour 1 值在 Energy Wh/h 列中显示为 0,而我希望它显示为 4,如下所示:

Hour  Energy Wh  Energy Wh/h
1        4          4
2        6          2 
3        9          3
4        15         6

I have tried using np.where:

我试过使用 np.where:

df['Energy Wh/h']  = np.where(df['Hour'] == 1,df['Energy Wh'].diff().fillna(df['Energy Wh']),df['Energy Wh'].diff().fillna(0))

but I am still getting a 0 value in the hour 1 row (df1), with no errors. How do I get the value in 'Energy Wh' for Hour 1 to be filled, instead of 0?

但我仍然在小时 1 行 (df1) 中得到 0 值,没有错误。如何获得要填充的第 1 小时的“能量 Wh”值,而不是 0?

回答by AChampion

You can just fillna()with the original column, without using np.where:

您可以只fillna()使用原始列,而无需使用np.where

>>> df['Energy Wh/h'] = df['Energy Wh'].diff().fillna(df['Energy Wh'])
>>> df
      Energy Wh  Energy Wh/h
Hour
   1          4          4.0
   2          6          2.0
   3          9          3.0
   4         15          6.0

回答by jezrael

First value of diffis always NaN, so faster is replace only this value without fillnaby loc, last convert floatvalues to intby astype(if necessary - no another NaNs and no another floatvalues):

第一个值diffalways NaN,所以更快的是只替换这个值而不用fillnaby loc,最后将float值转换为intby astype(如果需要 - 没有另一个NaNs 也没有另一个float值):

df['Energy W/h'] = df['Energy Wh'].diff()
df.loc[0, 'Energy W/h'] = df['Energy Wh'].iloc[0]
df['Energy W/h'] = df['Energy W/h'].astype(int)
print (df)
   Hour  Energy Wh  Energy W/h
0     1          4           4
1     2          6           2
2     3          9           3
3     4         15           6

More general is use:

更通用的是使用:

df.index = [5,6,7,8]
print (df)
   Hour  Energy Wh
5     1          4
6     2          6
7     3          9
8     4         15

df['Energy W/h'] = df['Energy Wh'].diff()
df.loc[df.index[0], 'Energy W/h'] = df['Energy Wh'].iloc[0]
df['Energy W/h'] = df['Energy W/h'].astype(int)
print (df)
   Hour  Energy Wh  Energy W/h
5     1          4           4
6     2          6           2
7     3          9           3
8     4         15           6