Python 如何计算熊猫数据框中连续行之间的差异?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34846146/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 15:36:20  来源:igfitidea点击:

How to calculate differences between consecutive rows in pandas data frame?

pythonpandas

提问by scrollex

I've got a data frame, df, with three columns: count_a, count_band date; the counts are floats, and the dates are consecutive days in 2015.

我有一个df包含三列的数据框count_a,count_bdate; 计数是浮点数,日期是 2015 年的连续天数。

I'm trying to figure out the difference between each day's counts in both the count_aand count_bcolumns —?meaning, I'm trying to calculate the difference between each row and the preceding row for both of those columns. I've set the date as the index, but am having trouble figuring out how to do this; there were a couple of hints about using pd.Seriesand pd.DataFrame.diffbut I haven't had any luck finding an applicable answer or set of instructions.

我正在尝试计算count_acount_b列中每天计数之间的差异- 意思是,我正在尝试计算这两列的每一行与前一行之间的差异。我已将日期设置为索引,但无法弄清楚如何执行此操作;有一些关于使用的提示pd.Seriespd.DataFrame.diff但我没有找到适用的答案或一组说明。

I'm a bit stuck, and would appreciate some guidance here.

我有点卡住了,希望得到一些指导。

Here's what my data frame looks like:

这是我的数据框的样子:

df=pd.Dataframe({'count_a': {Timestamp('2015-01-01 00:00:00'): 34175.0,
  Timestamp('2015-01-02 00:00:00'): 72640.0,
  Timestamp('2015-01-03 00:00:00'): 109354.0,
  Timestamp('2015-01-04 00:00:00'): 144491.0,
  Timestamp('2015-01-05 00:00:00'): 180355.0,
  Timestamp('2015-01-06 00:00:00'): 214615.0,
  Timestamp('2015-01-07 00:00:00'): 250096.0,
  Timestamp('2015-01-08 00:00:00'): 287880.0,
  Timestamp('2015-01-09 00:00:00'): 332528.0,
  Timestamp('2015-01-10 00:00:00'): 381460.0,
  Timestamp('2015-01-11 00:00:00'): 422981.0,
  Timestamp('2015-01-12 00:00:00'): 463539.0,
  Timestamp('2015-01-13 00:00:00'): 505395.0,
  Timestamp('2015-01-14 00:00:00'): 549027.0,
  Timestamp('2015-01-15 00:00:00'): 595377.0,
  Timestamp('2015-01-16 00:00:00'): 649043.0,
  Timestamp('2015-01-17 00:00:00'): 707727.0,
  Timestamp('2015-01-18 00:00:00'): 761287.0,
  Timestamp('2015-01-19 00:00:00'): 814372.0,
  Timestamp('2015-01-20 00:00:00'): 867096.0,
  Timestamp('2015-01-21 00:00:00'): 920838.0,
  Timestamp('2015-01-22 00:00:00'): 983405.0,
  Timestamp('2015-01-23 00:00:00'): 1067243.0,
  Timestamp('2015-01-24 00:00:00'): 1164421.0,
  Timestamp('2015-01-25 00:00:00'): 1252178.0,
  Timestamp('2015-01-26 00:00:00'): 1341484.0,
  Timestamp('2015-01-27 00:00:00'): 1427600.0,
  Timestamp('2015-01-28 00:00:00'): 1511549.0,
  Timestamp('2015-01-29 00:00:00'): 1594846.0,
  Timestamp('2015-01-30 00:00:00'): 1694226.0,
  Timestamp('2015-01-31 00:00:00'): 1806727.0,
  Timestamp('2015-02-01 00:00:00'): 1899880.0,
  Timestamp('2015-02-02 00:00:00'): 1987978.0,
  Timestamp('2015-02-03 00:00:00'): 2080338.0,
  Timestamp('2015-02-04 00:00:00'): 2175775.0,
  Timestamp('2015-02-05 00:00:00'): 2279525.0,
  Timestamp('2015-02-06 00:00:00'): 2403306.0,
  Timestamp('2015-02-07 00:00:00'): 2545696.0,
  Timestamp('2015-02-08 00:00:00'): 2672464.0,
  Timestamp('2015-02-09 00:00:00'): 2794788.0},
 'count_b': {Timestamp('2015-01-01 00:00:00'): nan,
  Timestamp('2015-01-02 00:00:00'): nan,
  Timestamp('2015-01-03 00:00:00'): nan,
  Timestamp('2015-01-04 00:00:00'): nan,
  Timestamp('2015-01-05 00:00:00'): nan,
  Timestamp('2015-01-06 00:00:00'): nan,
  Timestamp('2015-01-07 00:00:00'): nan,
  Timestamp('2015-01-08 00:00:00'): nan,
  Timestamp('2015-01-09 00:00:00'): nan,
  Timestamp('2015-01-10 00:00:00'): nan,
  Timestamp('2015-01-11 00:00:00'): nan,
  Timestamp('2015-01-12 00:00:00'): nan,
  Timestamp('2015-01-13 00:00:00'): nan,
  Timestamp('2015-01-14 00:00:00'): nan,
  Timestamp('2015-01-15 00:00:00'): nan,
  Timestamp('2015-01-16 00:00:00'): nan,
  Timestamp('2015-01-17 00:00:00'): nan,
  Timestamp('2015-01-18 00:00:00'): nan,
  Timestamp('2015-01-19 00:00:00'): nan,
  Timestamp('2015-01-20 00:00:00'): nan,
  Timestamp('2015-01-21 00:00:00'): nan,
  Timestamp('2015-01-22 00:00:00'): nan,
  Timestamp('2015-01-23 00:00:00'): nan,
  Timestamp('2015-01-24 00:00:00'): 71.0,
  Timestamp('2015-01-25 00:00:00'): 150.0,
  Timestamp('2015-01-26 00:00:00'): 236.0,
  Timestamp('2015-01-27 00:00:00'): 345.0,
  Timestamp('2015-01-28 00:00:00'): 1239.0,
  Timestamp('2015-01-29 00:00:00'): 2228.0,
  Timestamp('2015-01-30 00:00:00'): 7094.0,
  Timestamp('2015-01-31 00:00:00'): 16593.0,
  Timestamp('2015-02-01 00:00:00'): 27190.0,
  Timestamp('2015-02-02 00:00:00'): 37519.0,
  Timestamp('2015-02-03 00:00:00'): 49003.0,
  Timestamp('2015-02-04 00:00:00'): 63323.0,
  Timestamp('2015-02-05 00:00:00'): 79846.0,
  Timestamp('2015-02-06 00:00:00'): 101568.0,
  Timestamp('2015-02-07 00:00:00'): 127120.0,
  Timestamp('2015-02-08 00:00:00'): 149955.0,
  Timestamp('2015-02-09 00:00:00'): 171440.0}})

采纳答案by Mike Müller

diffshould give the desired result:

diff应该给出想要的结果:

>>> df.diff()
count_a  count_b
2015-01-01      NaN      NaN
2015-01-02    38465      NaN
2015-01-03    36714      NaN
2015-01-04    35137      NaN
2015-01-05    35864      NaN
....
2015-02-07   142390    25552
2015-02-08   126768    22835
2015-02-09   122324    21485

回答by David Wolever

You can using the .rolling_apply(…)method:

您可以使用以下.rolling_apply(…)方法:

diffs_a = pd.rolling_apply(df['count_a'], 2, lambda x: x[0] - x[1])

Alternatively, if it's easier, you can operate on the arrays directly:

或者,如果更简单,您可以直接对数组进行操作:

count_a_vals = df['count_a'].values
diffs_a = count_a_vals[:-1] - count_a_vals[1:]