pandas 如何计算具有条件的连续熊猫数据帧行之间的天差
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35534152/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to calculate day's difference between successive pandas dataframe rows with condition
提问by Neil
I have a pandas dataframe like following..
我有一个Pandas数据框,如下所示..
item_id date
101 2016-01-05
101 2016-01-21
121 2016-01-08
121 2016-01-22
128 2016-01-19
128 2016-02-17
131 2016-01-11
131 2016-01-23
131 2016-01-24
131 2016-02-06
131 2016-02-07
I want to calculate days difference between date column but with respect to item_id
column. First I want to sort the dataframe with date grouping on item_id. It should look like this
我想计算日期列之间的天数差异,但相对于item_id
列。首先,我想使用 item_id 上的日期分组对数据框进行排序。它应该是这样的
item_id date
101 2016-01-05
101 2016-01-08
121 2016-01-21
121 2016-01-22
128 2016-01-17
128 2016-02-19
131 2016-01-11
131 2016-01-23
131 2016-01-24
131 2016-02-06
131 2016-02-07
Then I want to calculate the difference between dates again grouping on item_id
So the output should look like following
然后我想计算日期之间的差异再次分组item_id
所以输出应该如下所示
item_id date day_difference
101 2016-01-05 0
101 2016-01-08 3
121 2016-01-21 0
121 2016-01-22 1
128 2016-01-17 0
128 2016-02-19 2
131 2016-01-11 0
131 2016-01-23 12
131 2016-01-24 1
131 2016-02-06 13
131 2016-02-07 1
For sorting I used something like this
为了排序,我使用了这样的东西
df.groupby('item_id').apply(lambda x: new_df.sort('date'))
df.groupby('item_id').apply(lambda x: new_df.sort('date'))
But,it didn't work out. I am able to calculate the difference between consecutive rows by following
但是,它没有成功。我可以通过以下方式计算连续行之间的差异
(df['date'] - df['date'].shift(1))
(df['date'] - df['date'].shift(1))
But not for grouping with item_id
但不是为了分组 item_id
回答by jezrael
I think you can use:
我认为你可以使用:
df['date'] = df.groupby('item_id')['date'].apply(lambda x: x.sort_values())
df['diff'] = df.groupby('item_id')['date'].diff() / np.timedelta64(1, 'D')
df['diff'] = df['diff'].fillna(0)
print df
item_id date diff
0 101 2016-01-05 0
1 101 2016-01-21 16
2 121 2016-01-08 0
3 121 2016-01-22 14
4 128 2016-01-19 0
5 128 2016-02-17 29
6 131 2016-01-11 0
7 131 2016-01-23 12
8 131 2016-01-24 1
9 131 2016-02-06 13
10 131 2016-02-07 1