Python 熊猫数据框分组日期时间月份

Question

提问by atomh33ls

Consider a csv file:

考虑一个 csv 文件：

string,date,number
a string,2/5/11 9:16am,1.0
a string,3/5/11 10:44pm,2.0
a string,4/22/11 12:07pm,3.0
a string,4/22/11 12:10pm,4.0
a string,4/29/11 11:59am,1.0
a string,5/2/11 1:41pm,2.0
a string,5/2/11 2:02pm,3.0
a string,5/2/11 2:56pm,4.0
a string,5/2/11 3:00pm,5.0
a string,5/2/14 3:02pm,6.0
a string,5/2/14 3:18pm,7.0

I can read this in, and reformat the date column into datetime format:

我可以读入，并将日期列重新格式化为日期时间格式：

b=pd.read_csv('b.dat')
b['date']=pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')

I have been trying to group the data by month. It seems like there should be an obvious way of accessing the month and grouping by that. But I can't seem to do it. Does anyone know how?

我一直在尝试按月对数据进行分组。似乎应该有一种明显的方式来访问月份并按月份分组。但我似乎做不到。有谁知道怎么做？

What I am currently trying is re-indexing by the date:

我目前正在尝试的是按日期重新索引：

b.index=b['date']

I can access the month like so:

我可以像这样访问月份：

b.index.month

However I can't seem to find a function to lump together by month.

但是，我似乎无法找到按月合并的功能。

Answer 1

采纳答案by atomh33ls

Managed to do it:

设法做到了：

b = pd.read_csv('b.dat')
b.index = pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')
b.groupby(by=[b.index.month, b.index.year])

Or

或者

b.groupby(pd.Grouper(freq='M'))  # update for v0.21+

Answer 2

回答by PandasRocks

(update: 2018)

（更新：2018）

Note that pd.Timegrouperis depreciated and will be removed. Use instead:

请注意，pd.Timegrouper已折旧并将被删除。改用：

 df.groupby(pd.Grouper(freq='M'))

Answer 3

回答by jpp

One solution which avoids MultiIndex is to create a new datetimecolumn setting day = 1. Then group by this column. Trivial example below.

避免 MultiIndex 的一种解决方案是创建一个新datetime列，设置 day = 1。然后按此列分组。下面的简单例子。

df = pd.DataFrame({'Date': pd.to_datetime(['2017-10-05', '2017-10-20']),
                   'Values': [5, 10]})

# normalize day to beginning of month
df['YearMonth'] = df['Date'] - pd.offsets.MonthBegin(1)

# two alternative methods
df['YearMonth'] = df['Date'] - pd.to_timedelta(df['Date'].dt.day-1, unit='D')
df['YearMonth'] = df['Date'].map(lambda dt: dt.replace(day=1))

g = df.groupby('YearMonth')

res = g['Values'].sum()

# YearMonth
# 2017-10-01    15
# Name: Values, dtype: int64

The subtle benefit of this solution is, unlike pd.Grouper, the grouper index is normalized to the beginningof each month rather than the end, and therefore you can easily extract groups via get_group:

此解决方案的微妙好处是，与不同pd.Grouper，石斑鱼索引标准化为每个月的开始而不是结束，因此您可以通过get_group以下方式轻松提取组：

some_group = g.get_group('2017-10-01')

Calculating the last day of October is slightly more cumbersome. pd.Grouper, as of v0.23, does support a conventionparameter, but this is only applicable for a PeriodIndexgrouper.

计算十月的最后一天稍微麻烦一些。pd.Grouper，从 v0.23 开始，确实支持convention参数，但这仅适用于PeriodIndex石斑鱼。

Answer 4

回答by tsando

Slightly alternative solution to @jpp's but outputting a YearMonthstring:

@jpp 的替代解决方案，但输出一个YearMonth字符串：

df['YearMonth'] = pd.to_datetime(df['Date']).apply(lambda x: '{year}-{month}'.format(year=x.year, month=x.month))

res = df.groupby('YearMonth')['Values'].sum()

Python 熊猫数据框分组日期时间月份

提问by atomh33ls

采纳答案by atomh33ls

回答by PandasRocks

回答by jpp

回答by tsando

相关推荐

最近更新

标签

Python 熊猫数据框分组日期时间月份

提问by atomh33ls

采纳答案by atomh33ls

回答by PandasRocks

回答by jpp

回答by tsando

相关推荐

Python 如何在tableWidget PyQT中添加一行？

Python 3.4 和 2.7 安装没有 Script 文件夹，也没有安装 pip

Python 请求 HTTPConnectionPool 和最大重试次数超过 url

电话簿的 Python 作业

相关推荐

最近更新

标签