pandas 使用熊猫按日期范围分组

Question

提问by eljusticiero67

I am looking to group by two columns: user_id and date; however, if the dates are close enough, I want to be able to consider the two entries part of the same group and group accordingly. Date is m-d-y

我希望按两列分组：user_id 和 date；但是，如果日期足够接近，我希望能够相应地考虑同一组和组的两个条目部分。日期是 mdy

user_id     date       val
1           1-1-17     1
2           1-1-17     1
3           1-1-17     1
1           1-1-17     1
1           1-2-17     1
2           1-2-17     1
2           1-10-17    1
3           2-1-17     1

The grouping would group by user_id and dates +/- 3 days from each other. so the group by summing val would look like:

分组将按 user_id 和彼此相距 +/- 3 天的日期分组。所以通过总结 val 的组看起来像：

user_id     date       sum(val)
1           1-2-17     3
2           1-2-17     2
2           1-10-17    1
3           1-1-17     1
3           2-1-17     1

Any way someone could think of that this could be done (somewhat) easily? I know there are some problematic aspects of this. for example, what to do if the dates string together endlessly with three days apart. but the exact data im using only has 2 values per person..

任何人都可以想到这可以（有点）轻松地完成？我知道这有一些问题。例如，如果日期无休止地串在一起，相隔三天，该怎么办。但我使用的确切数据每人只有 2 个值。

Thanks!

谢谢！

Answer 1

回答by cs95

I'd convert this to a datetimecolumn and then use pd.TimeGrouper:

我会将其转换为一datetime列，然后使用pd.TimeGrouper：

dates =  pd.to_datetime(df.date, format='%m-%d-%y')
print(dates)
0   2017-01-01
1   2017-01-01
2   2017-01-01
3   2017-01-01
4   2017-01-02
5   2017-01-02
6   2017-01-10
7   2017-02-01
Name: date, dtype: datetime64[ns]

df = (df.assign(date=dates).set_index('date')
        .groupby(['user_id', pd.TimeGrouper('3D')])
        .sum()
        .reset_index())    
print(df)
   user_id       date  val
0        1 2017-01-01    3
1        2 2017-01-01    2
2        2 2017-01-10    1
3        3 2017-01-01    1
4        3 2017-01-31    1

回答by YOBEN_S

I come with a very ugly solution but still work...

我带来了一个非常丑陋的解决方案，但仍然有效......

df=df.sort_values(['user_id','date'])
df['Key']=df.sort_values(['user_id','date']).groupby('user_id')['date'].diff().dt.days.lt(3).ne(True).cumsum()
df.groupby(['user_id','Key'],as_index=False).agg({'val':'sum','date':'first'})

Out[586]: 
   user_id  Key  val       date
0        1    1    3 2017-01-01
1        2    2    2 2017-01-01
2        2    3    1 2017-01-10
3        3    4    1 2017-01-01
4        3    5    1 2017-02-01

pandas 使用熊猫按日期范围分组

提问by eljusticiero67

回答by cs95

回答by YOBEN_S

相关推荐

最近更新

标签

pandas 使用熊猫按日期范围分组

提问by eljusticiero67

回答by cs95

回答by YOBEN_S

相关推荐

pandas python - 基于列中的值重复行 x 次数

pandas 在 Python 中计算 XIRR

使用 Pandas 将工作表添加到现有 Excel 文件中

pandas 转换为 html 表时删除熊猫数据框中的索引

相关推荐

最近更新

标签