Python 熊猫数据框分组日期时间月份
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24082784/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas dataframe groupby datetime month
提问by atomh33ls
Consider a csv file:
考虑一个 csv 文件:
string,date,number
a string,2/5/11 9:16am,1.0
a string,3/5/11 10:44pm,2.0
a string,4/22/11 12:07pm,3.0
a string,4/22/11 12:10pm,4.0
a string,4/29/11 11:59am,1.0
a string,5/2/11 1:41pm,2.0
a string,5/2/11 2:02pm,3.0
a string,5/2/11 2:56pm,4.0
a string,5/2/11 3:00pm,5.0
a string,5/2/14 3:02pm,6.0
a string,5/2/14 3:18pm,7.0
I can read this in, and reformat the date column into datetime format:
我可以读入,并将日期列重新格式化为日期时间格式:
b=pd.read_csv('b.dat')
b['date']=pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')
I have been trying to group the data by month. It seems like there should be an obvious way of accessing the month and grouping by that. But I can't seem to do it. Does anyone know how?
我一直在尝试按月对数据进行分组。似乎应该有一种明显的方式来访问月份并按月份分组。但我似乎做不到。有谁知道怎么做?
What I am currently trying is re-indexing by the date:
我目前正在尝试的是按日期重新索引:
b.index=b['date']
I can access the month like so:
我可以像这样访问月份:
b.index.month
However I can't seem to find a function to lump together by month.
但是,我似乎无法找到按月合并的功能。
采纳答案by atomh33ls
Managed to do it:
设法做到了:
b = pd.read_csv('b.dat')
b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')
b.groupby(by=[b.index.month, b.index.year])
Or
或者
b.groupby(pd.Grouper(freq='M')) # update for v0.21+
回答by PandasRocks
(update: 2018)
(更新:2018)
Note that pd.Timegrouper
is depreciated and will be removed. Use instead:
请注意,pd.Timegrouper
已折旧并将被删除。改用:
df.groupby(pd.Grouper(freq='M'))
回答by jpp
One solution which avoids MultiIndex is to create a new datetime
column setting day = 1. Then group by this column. Trivial example below.
避免 MultiIndex 的一种解决方案是创建一个新datetime
列,设置 day = 1。然后按此列分组。下面的简单例子。
df = pd.DataFrame({'Date': pd.to_datetime(['2017-10-05', '2017-10-20']),
'Values': [5, 10]})
# normalize day to beginning of month
df['YearMonth'] = df['Date'] - pd.offsets.MonthBegin(1)
# two alternative methods
df['YearMonth'] = df['Date'] - pd.to_timedelta(df['Date'].dt.day-1, unit='D')
df['YearMonth'] = df['Date'].map(lambda dt: dt.replace(day=1))
g = df.groupby('YearMonth')
res = g['Values'].sum()
# YearMonth
# 2017-10-01 15
# Name: Values, dtype: int64
The subtle benefit of this solution is, unlike pd.Grouper
, the grouper index is normalized to the beginningof each month rather than the end, and therefore you can easily extract groups via get_group
:
此解决方案的微妙好处是,与 不同pd.Grouper
,石斑鱼索引标准化为每个月的开始而不是结束,因此您可以通过get_group
以下方式轻松提取组:
some_group = g.get_group('2017-10-01')
Calculating the last day of October is slightly more cumbersome. pd.Grouper
, as of v0.23, does support a convention
parameter, but this is only applicable for a PeriodIndex
grouper.
计算十月的最后一天稍微麻烦一些。pd.Grouper
,从 v0.23 开始,确实支持convention
参数,但这仅适用于PeriodIndex
石斑鱼。
回答by tsando
Slightly alternative solution to @jpp's but outputting a YearMonth
string:
@jpp 的替代解决方案,但输出一个YearMonth
字符串:
df['YearMonth'] = pd.to_datetime(df['Date']).apply(lambda x: '{year}-{month}'.format(year=x.year, month=x.month))
res = df.groupby('YearMonth')['Values'].sum()