Python 熊猫数据框分组日期时间月份

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24082784/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:56:17  来源:igfitidea点击:

pandas dataframe groupby datetime month

pythonpandasdatetimepandas-groupby

提问by atomh33ls

Consider a csv file:

考虑一个 csv 文件:

string,date,number
a string,2/5/11 9:16am,1.0
a string,3/5/11 10:44pm,2.0
a string,4/22/11 12:07pm,3.0
a string,4/22/11 12:10pm,4.0
a string,4/29/11 11:59am,1.0
a string,5/2/11 1:41pm,2.0
a string,5/2/11 2:02pm,3.0
a string,5/2/11 2:56pm,4.0
a string,5/2/11 3:00pm,5.0
a string,5/2/14 3:02pm,6.0
a string,5/2/14 3:18pm,7.0

I can read this in, and reformat the date column into datetime format:

我可以读入,并将日期列重新格式化为日期时间格式:

b=pd.read_csv('b.dat')
b['date']=pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')

I have been trying to group the data by month. It seems like there should be an obvious way of accessing the month and grouping by that. But I can't seem to do it. Does anyone know how?

我一直在尝试按月对数据进行分组。似乎应该有一种明显的方式来访问月份并按月份分组。但我似乎做不到。有谁知道怎么做?

What I am currently trying is re-indexing by the date:

我目前正在尝试的是按日期重新索引:

b.index=b['date']

I can access the month like so:

我可以像这样访问月份:

b.index.month

However I can't seem to find a function to lump together by month.

但是,我似乎无法找到按月合并的功能。

采纳答案by atomh33ls

Managed to do it:

设法做到了:

b = pd.read_csv('b.dat')
b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')
b.groupby(by=[b.index.month, b.index.year])

Or

或者

b.groupby(pd.Grouper(freq='M'))  # update for v0.21+

回答by PandasRocks

(update: 2018)

(更新:2018)

Note that pd.Timegrouperis depreciated and will be removed. Use instead:

请注意,pd.Timegrouper已折旧并将被删除。改用:

 df.groupby(pd.Grouper(freq='M'))

回答by jpp

One solution which avoids MultiIndex is to create a new datetimecolumn setting day = 1. Then group by this column. Trivial example below.

避免 MultiIndex 的一种解决方案是创建一个新datetime列,设置 day = 1。然后按此列分组。下面的简单例子。

df = pd.DataFrame({'Date': pd.to_datetime(['2017-10-05', '2017-10-20']),
                   'Values': [5, 10]})

# normalize day to beginning of month
df['YearMonth'] = df['Date'] - pd.offsets.MonthBegin(1)

# two alternative methods
df['YearMonth'] = df['Date'] - pd.to_timedelta(df['Date'].dt.day-1, unit='D')
df['YearMonth'] = df['Date'].map(lambda dt: dt.replace(day=1))

g = df.groupby('YearMonth')

res = g['Values'].sum()

# YearMonth
# 2017-10-01    15
# Name: Values, dtype: int64

The subtle benefit of this solution is, unlike pd.Grouper, the grouper index is normalized to the beginningof each month rather than the end, and therefore you can easily extract groups via get_group:

此解决方案的微妙好处是,与 不同pd.Grouper,石斑鱼索引标准化为每个月的开始而不是结束,因此您可以通过get_group以下方式轻松提取组:

some_group = g.get_group('2017-10-01')

Calculating the last day of October is slightly more cumbersome. pd.Grouper, as of v0.23, does support a conventionparameter, but this is only applicable for a PeriodIndexgrouper.

计算十月的最后一天稍微麻烦一些。pd.Grouper,从 v0.23 开始,确实支持convention参数,但这仅适用于PeriodIndex石斑鱼。

回答by tsando

Slightly alternative solution to @jpp's but outputting a YearMonthstring:

@jpp 的替代解决方案,但输出一个YearMonth字符串:

df['YearMonth'] = pd.to_datetime(df['Date']).apply(lambda x: '{year}-{month}'.format(year=x.year, month=x.month))

res = df.groupby('YearMonth')['Values'].sum()