pandas diff() 为一阶差分给出 0 值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42748566/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas diff() giving 0 value for first difference
提问by warrenfitzhenry
I have df:
我有 df:
Hour Energy Wh
1 4
2 6
3 9
4 15
I would like to add a column that shows the per hour difference. I am using this:
我想添加一个显示每小时差异的列。我正在使用这个:
df['Energy Wh/h'] = df['Energy Wh'].diff().fillna(0)
df1:
df1:
Hour Energy Wh Energy Wh/h
1 4 0
2 6 2
3 9 3
4 15 6
However, the Hour 1 value is showing up as 0 in the Energy Wh/h column, whereas I would like it to show up as 4, like below:
但是,Hour 1 值在 Energy Wh/h 列中显示为 0,而我希望它显示为 4,如下所示:
Hour Energy Wh Energy Wh/h
1 4 4
2 6 2
3 9 3
4 15 6
I have tried using np.where:
我试过使用 np.where:
df['Energy Wh/h'] = np.where(df['Hour'] == 1,df['Energy Wh'].diff().fillna(df['Energy Wh']),df['Energy Wh'].diff().fillna(0))
but I am still getting a 0 value in the hour 1 row (df1), with no errors. How do I get the value in 'Energy Wh' for Hour 1 to be filled, instead of 0?
但我仍然在小时 1 行 (df1) 中得到 0 值,没有错误。如何获得要填充的第 1 小时的“能量 Wh”值,而不是 0?
回答by AChampion
You can just fillna()
with the original column, without using np.where
:
您可以只fillna()
使用原始列,而无需使用np.where
:
>>> df['Energy Wh/h'] = df['Energy Wh'].diff().fillna(df['Energy Wh'])
>>> df
Energy Wh Energy Wh/h
Hour
1 4 4.0
2 6 2.0
3 9 3.0
4 15 6.0
回答by jezrael
First value of diff
is always NaN
, so faster is replace only this value without fillna
by loc
, last convert float
values to int
by astype
(if necessary - no another NaN
s and no another float
values):
第一个值diff
always NaN
,所以更快的是只替换这个值而不用fillna
by loc
,最后将float
值转换为int
by astype
(如果需要 - 没有另一个NaN
s 也没有另一个float
值):
df['Energy W/h'] = df['Energy Wh'].diff()
df.loc[0, 'Energy W/h'] = df['Energy Wh'].iloc[0]
df['Energy W/h'] = df['Energy W/h'].astype(int)
print (df)
Hour Energy Wh Energy W/h
0 1 4 4
1 2 6 2
2 3 9 3
3 4 15 6
More general is use:
更通用的是使用:
df.index = [5,6,7,8]
print (df)
Hour Energy Wh
5 1 4
6 2 6
7 3 9
8 4 15
df['Energy W/h'] = df['Energy Wh'].diff()
df.loc[df.index[0], 'Energy W/h'] = df['Energy Wh'].iloc[0]
df['Energy W/h'] = df['Energy W/h'].astype(int)
print (df)
Hour Energy Wh Energy W/h
5 1 4 4
6 2 6 2
7 3 9 3
8 4 15 6