pandas 如何计算具有条件的连续熊猫数据帧行之间的天差

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35534152/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:44:04  来源:igfitidea点击:

How to calculate day's difference between successive pandas dataframe rows with condition

pythonpandas

提问by Neil

I have a pandas dataframe like following..

我有一个Pandas数据框,如下所示..

item_id        date
  101     2016-01-05
  101     2016-01-21
  121     2016-01-08
  121     2016-01-22
  128     2016-01-19
  128     2016-02-17
  131     2016-01-11
  131     2016-01-23
  131     2016-01-24
  131     2016-02-06
  131     2016-02-07

I want to calculate days difference between date column but with respect to item_idcolumn. First I want to sort the dataframe with date grouping on item_id. It should look like this

我想计算日期列之间的天数差异,但相对于item_id列。首先,我想使用 item_id 上的日期分组对数据框进行排序。它应该是这样的

item_id        date     
  101     2016-01-05         
  101     2016-01-08         
  121     2016-01-21         
  121     2016-01-22         
  128     2016-01-17         
  128     2016-02-19
  131     2016-01-11
  131     2016-01-23
  131     2016-01-24
  131     2016-02-06
  131     2016-02-07

Then I want to calculate the difference between dates again grouping on item_idSo the output should look like following

然后我想计算日期之间的差异再次分组item_id所以输出应该如下所示

 item_id        date      day_difference 
  101     2016-01-05          0
  101     2016-01-08          3
  121     2016-01-21          0
  121     2016-01-22          1
  128     2016-01-17          0
  128     2016-02-19          2
  131     2016-01-11          0
  131     2016-01-23          12
  131     2016-01-24          1
  131     2016-02-06          13 
  131     2016-02-07          1

For sorting I used something like this

为了排序,我使用了这样的东西

df.groupby('item_id').apply(lambda x: new_df.sort('date'))

df.groupby('item_id').apply(lambda x: new_df.sort('date'))

But,it didn't work out. I am able to calculate the difference between consecutive rows by following

但是,它没有成功。我可以通过以下方式计算连续行之间的差异

(df['date'] - df['date'].shift(1))

(df['date'] - df['date'].shift(1))

But not for grouping with item_id

但不是为了分组 item_id

回答by jezrael

I think you can use:

我认为你可以使用:

df['date'] = df.groupby('item_id')['date'].apply(lambda x: x.sort_values())

df['diff'] = df.groupby('item_id')['date'].diff() / np.timedelta64(1, 'D')
df['diff'] = df['diff'].fillna(0)
print df
    item_id       date  diff
0       101 2016-01-05     0
1       101 2016-01-21    16
2       121 2016-01-08     0
3       121 2016-01-22    14
4       128 2016-01-19     0
5       128 2016-02-17    29
6       131 2016-01-11     0
7       131 2016-01-23    12
8       131 2016-01-24     1
9       131 2016-02-06    13
10      131 2016-02-07     1