Python 如何使用 Pandas 按月和年对行进行分组和计数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38792122/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:26:14  来源:igfitidea点击:

How to group and count rows by month and year using Pandas?

pythonpandas

提问by nsbm

I have a dataset with personal data such as name, height, weight and date of birth. I would build a graph with the number of people born in a particular month and year. I'm using python pandas to accomplish this and my strategy was to try to group by year and month and add using count. But the closest I got is to get the count of people by year or by month but not by both.

我有一个包含姓名、身高、体重和出生日期等个人数据的数据集。我会用特定月份和年份出生的人数构建一个图表。我正在使用 python pandas 来实现这一点,我的策略是尝试按年和月分组并使用计数添加。但我得到的最接近的是按年或按月计算人数,但不能同时计算。

df['birthdate'].groupby(df.birthdate.dt.year).agg('count')

Other questions in stackoverflow point to a Grouper called TimeGrouper but searching in pandas documentation found nothing. Any idea?

stackoverflow 中的其他问题指向名为 TimeGrouper 的 Grouper,但在 pandas 文档中搜索一无所获。任何的想法?

回答by EdChum

To group on multiple criteria, pass a list of the columns or criteria:

要按多个条件分组,请传递列或条件列表:

df['birthdate'].groupby([df.birthdate.dt.year, df.birthdate.dt.month]).agg('count')

Example:

例子:

In [165]:
df = pd.DataFrame({'birthdate':pd.date_range(start=dt.datetime(2015,12,20),end=dt.datetime(2016,3,1))})
df.groupby([df['birthdate'].dt.year, df['birthdate'].dt.month]).agg({'count'})

Out[165]:
                    birthdate
                        count
birthdate birthdate          
2015      12               12
2016      1                31
          2                29
          3                 1

UPDATE

更新

As of version 0.23.0the above code no longer works due to the restriction that multi-index level names must be unique, you now need to renamethe levels in order for this to work:

从版本开始0.23.0,由于多索引级别名称必须唯一的限制,上述代码不再有效,您现在需要rename这些级别才能使其工作:

In[107]:
df.groupby([df['birthdate'].dt.year.rename('year'), df['birthdate'].dt.month.rename('month')]).agg({'count'})

Out[107]: 
           birthdate
               count
year month          
2015 12           12
2016 1            31
     2            29
     3             1

回答by Andy Hayden

You can also use the "monthly" period with to_periodwith the dtaccessor:

您还可以将“每月”期间与访问器一起to_period使用dt

In [11]: df = pd.DataFrame({'birthdate': pd.date_range(start='20-12-2015', end='3-1-2016')})

In [12]: df['birthdate'].groupby(df.birthdate.dt.to_period("M")).agg('count')
Out[12]:
birthdate
2015-12    12
2016-01    31
2016-02    29
2016-03     1
Freq: M, Name: birthdate, dtype: int64


It's worth noting if the datetime is the index (rather than a column) you can use resample:

值得注意的是,如果日期时间是您可以使用的索引(而不是列)resample

df.resample("M").count()

回答by Alberto Garcia-Raboso

Another solution is to set birthdateas the index and resample:

另一种解决方案是设置birthdate为索引并重新采样:

import pandas as pd

df = pd.DataFrame({'birthdate': pd.date_range(start='20-12-2015', end='3-1-2016')})
df.set_index('birthdate').resample('MS').size()

Output:

输出:

birthdate
2015-12-01    12
2016-01-01    31
2016-02-01    29
2016-03-01     1
Freq: MS, dtype: int64

回答by saran3h

As of April 2019: This will work. Pandas version - 0.24.x

截至 2019 年 4 月:这将起作用。熊猫版本 - 0.24.x

df.groupby([df.dates.dt.year.rename('year'), df.dates.dt.month.rename('month')]).size()

df.groupby([df.dates.dt.year.rename('year'), df.dates.dt.month.rename('month')]).size()

回答by user1775015

Replace date and count fields with your respective column names. This piece of code will group, sum and sort based on the given parameters. You can also change the frequency to 1M or 2M and so on...

用您各自的列名替换日期和计数字段。这段代码将根据给定的参数进行分组、求和和排序。您还可以将频率更改为 1M 或 2M 等等...

df[['date', 'count']].groupby(pd.Grouper(key='date', freq='1M')).sum().sort_values(by='date', ascending=True)['count']