Python 如何计算熊猫数据框中连续行之间的差异?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34846146/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to calculate differences between consecutive rows in pandas data frame?
提问by scrollex
I've got a data frame, df
, with three columns: count_a
, count_b
and date
; the counts are floats, and the dates are consecutive days in 2015.
我有一个df
包含三列的数据框count_a
,count_b
和date
; 计数是浮点数,日期是 2015 年的连续天数。
I'm trying to figure out the difference between each day's counts in both the count_a
and count_b
columns —?meaning, I'm trying to calculate the difference between each row and the preceding row for both of those columns. I've set the date as the index, but am having trouble figuring out how to do this; there were a couple of hints about using pd.Series
and pd.DataFrame.diff
but I haven't had any luck finding an applicable answer or set of instructions.
我正在尝试计算count_a
和count_b
列中每天计数之间的差异- 意思是,我正在尝试计算这两列的每一行与前一行之间的差异。我已将日期设置为索引,但无法弄清楚如何执行此操作;有一些关于使用的提示pd.Series
,pd.DataFrame.diff
但我没有找到适用的答案或一组说明。
I'm a bit stuck, and would appreciate some guidance here.
我有点卡住了,希望得到一些指导。
Here's what my data frame looks like:
这是我的数据框的样子:
df=pd.Dataframe({'count_a': {Timestamp('2015-01-01 00:00:00'): 34175.0,
Timestamp('2015-01-02 00:00:00'): 72640.0,
Timestamp('2015-01-03 00:00:00'): 109354.0,
Timestamp('2015-01-04 00:00:00'): 144491.0,
Timestamp('2015-01-05 00:00:00'): 180355.0,
Timestamp('2015-01-06 00:00:00'): 214615.0,
Timestamp('2015-01-07 00:00:00'): 250096.0,
Timestamp('2015-01-08 00:00:00'): 287880.0,
Timestamp('2015-01-09 00:00:00'): 332528.0,
Timestamp('2015-01-10 00:00:00'): 381460.0,
Timestamp('2015-01-11 00:00:00'): 422981.0,
Timestamp('2015-01-12 00:00:00'): 463539.0,
Timestamp('2015-01-13 00:00:00'): 505395.0,
Timestamp('2015-01-14 00:00:00'): 549027.0,
Timestamp('2015-01-15 00:00:00'): 595377.0,
Timestamp('2015-01-16 00:00:00'): 649043.0,
Timestamp('2015-01-17 00:00:00'): 707727.0,
Timestamp('2015-01-18 00:00:00'): 761287.0,
Timestamp('2015-01-19 00:00:00'): 814372.0,
Timestamp('2015-01-20 00:00:00'): 867096.0,
Timestamp('2015-01-21 00:00:00'): 920838.0,
Timestamp('2015-01-22 00:00:00'): 983405.0,
Timestamp('2015-01-23 00:00:00'): 1067243.0,
Timestamp('2015-01-24 00:00:00'): 1164421.0,
Timestamp('2015-01-25 00:00:00'): 1252178.0,
Timestamp('2015-01-26 00:00:00'): 1341484.0,
Timestamp('2015-01-27 00:00:00'): 1427600.0,
Timestamp('2015-01-28 00:00:00'): 1511549.0,
Timestamp('2015-01-29 00:00:00'): 1594846.0,
Timestamp('2015-01-30 00:00:00'): 1694226.0,
Timestamp('2015-01-31 00:00:00'): 1806727.0,
Timestamp('2015-02-01 00:00:00'): 1899880.0,
Timestamp('2015-02-02 00:00:00'): 1987978.0,
Timestamp('2015-02-03 00:00:00'): 2080338.0,
Timestamp('2015-02-04 00:00:00'): 2175775.0,
Timestamp('2015-02-05 00:00:00'): 2279525.0,
Timestamp('2015-02-06 00:00:00'): 2403306.0,
Timestamp('2015-02-07 00:00:00'): 2545696.0,
Timestamp('2015-02-08 00:00:00'): 2672464.0,
Timestamp('2015-02-09 00:00:00'): 2794788.0},
'count_b': {Timestamp('2015-01-01 00:00:00'): nan,
Timestamp('2015-01-02 00:00:00'): nan,
Timestamp('2015-01-03 00:00:00'): nan,
Timestamp('2015-01-04 00:00:00'): nan,
Timestamp('2015-01-05 00:00:00'): nan,
Timestamp('2015-01-06 00:00:00'): nan,
Timestamp('2015-01-07 00:00:00'): nan,
Timestamp('2015-01-08 00:00:00'): nan,
Timestamp('2015-01-09 00:00:00'): nan,
Timestamp('2015-01-10 00:00:00'): nan,
Timestamp('2015-01-11 00:00:00'): nan,
Timestamp('2015-01-12 00:00:00'): nan,
Timestamp('2015-01-13 00:00:00'): nan,
Timestamp('2015-01-14 00:00:00'): nan,
Timestamp('2015-01-15 00:00:00'): nan,
Timestamp('2015-01-16 00:00:00'): nan,
Timestamp('2015-01-17 00:00:00'): nan,
Timestamp('2015-01-18 00:00:00'): nan,
Timestamp('2015-01-19 00:00:00'): nan,
Timestamp('2015-01-20 00:00:00'): nan,
Timestamp('2015-01-21 00:00:00'): nan,
Timestamp('2015-01-22 00:00:00'): nan,
Timestamp('2015-01-23 00:00:00'): nan,
Timestamp('2015-01-24 00:00:00'): 71.0,
Timestamp('2015-01-25 00:00:00'): 150.0,
Timestamp('2015-01-26 00:00:00'): 236.0,
Timestamp('2015-01-27 00:00:00'): 345.0,
Timestamp('2015-01-28 00:00:00'): 1239.0,
Timestamp('2015-01-29 00:00:00'): 2228.0,
Timestamp('2015-01-30 00:00:00'): 7094.0,
Timestamp('2015-01-31 00:00:00'): 16593.0,
Timestamp('2015-02-01 00:00:00'): 27190.0,
Timestamp('2015-02-02 00:00:00'): 37519.0,
Timestamp('2015-02-03 00:00:00'): 49003.0,
Timestamp('2015-02-04 00:00:00'): 63323.0,
Timestamp('2015-02-05 00:00:00'): 79846.0,
Timestamp('2015-02-06 00:00:00'): 101568.0,
Timestamp('2015-02-07 00:00:00'): 127120.0,
Timestamp('2015-02-08 00:00:00'): 149955.0,
Timestamp('2015-02-09 00:00:00'): 171440.0}})
采纳答案by Mike Müller
diff
should give the desired result:
diff
应该给出想要的结果:
>>> df.diff()
count_a count_b
2015-01-01 NaN NaN
2015-01-02 38465 NaN
2015-01-03 36714 NaN
2015-01-04 35137 NaN
2015-01-05 35864 NaN
....
2015-02-07 142390 25552
2015-02-08 126768 22835
2015-02-09 122324 21485
回答by David Wolever
You can using the .rolling_apply(…)
method:
您可以使用以下.rolling_apply(…)
方法:
diffs_a = pd.rolling_apply(df['count_a'], 2, lambda x: x[0] - x[1])
Alternatively, if it's easier, you can operate on the arrays directly:
或者,如果更简单,您可以直接对数组进行操作:
count_a_vals = df['count_a'].values
diffs_a = count_a_vals[:-1] - count_a_vals[1:]