pandas 熊猫分组日期
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19436365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas groupby date
提问by ArtDijk
I have a DataFrame with events. One or more events can occur at a date (so the date can't be an index). The date range is several years. I want to groupby years and months and have a count of the Category values. Thnx
我有一个带有事件的 DataFrame。一个或多个事件可能发生在某个日期(因此该日期不能作为索引)。日期范围是几年。我想按年份和月份分组并计算类别值。谢谢
in [12]: df = pd.read_excel('Pandas_Test.xls', 'sheet1')
In [13]: df
Out[13]:
EventRefNr DateOccurence Type Category
0 86596 2010-01-02 00:00:00 3 Small
1 86779 2010-01-09 00:00:00 13 Medium
2 86780 2010-02-10 00:00:00 6 Small
3 86781 2010-02-09 00:00:00 17 Small
4 86898 2010-02-10 00:00:00 6 Small
5 86898 2010-02-11 00:00:00 6 Small
6 86902 2010-02-17 00:00:00 9 Small
7 86908 2010-02-19 00:00:00 3 Medium
8 86908 2010-03-05 00:00:00 3 Medium
9 86909 2010-03-06 00:00:00 8 Small
10 86930 2010-03-12 00:00:00 29 Small
11 86934 2010-03-16 00:00:00 9 Small
12 86940 2010-04-08 00:00:00 9 High
13 86941 2010-04-09 00:00:00 17 Small
14 86946 2010-04-14 00:00:00 10 Small
15 86950 2011-01-19 00:00:00 12 Small
16 86956 2011-01-24 00:00:00 13 Small
17 86959 2011-01-27 00:00:00 17 Small
I tried:
我试过:
df.groupby(df['DateOccurence'])
采纳答案by Andy Hayden
You can apply value_countsto the SeriesGroupby (for the column):
您可以将value_counts应用于 SeriesGroupby(对于列):
In [11]: g = df.groupby('DateOccurence')
In [12]: g.Category.apply(pd.value_counts)
Out[12]:
DateOccurence
2010-01-02 Small 1
2010-01-09 Medium 1
2010-02-09 Small 1
2010-02-10 Small 2
2010-02-11 Small 1
2010-02-17 Small 1
2010-02-19 Medium 1
2010-03-05 Medium 1
2010-03-06 Small 1
2010-03-12 Small 1
2010-03-16 Small 1
2010-04-08 High 1
2010-04-09 Small 1
2010-04-14 Small 1
2011-01-19 Small 1
2011-01-24 Small 1
2011-01-27 Small 1
dtype: int64
I actually hoped this to return the following DataFrame, but you need to unstackit:
其实我希望这个返回以下数据帧,但你需要拆散它:
In [13]: g.Category.apply(pd.value_counts).unstack(-1).fillna(0)
Out[13]:
High Medium Small
DateOccurence
2010-01-02 0 0 1
2010-01-09 0 1 0
2010-02-09 0 0 1
2010-02-10 0 0 2
2010-02-11 0 0 1
2010-02-17 0 0 1
2010-02-19 0 1 0
2010-03-05 0 1 0
2010-03-06 0 0 1
2010-03-12 0 0 1
2010-03-16 0 0 1
2010-04-08 1 0 0
2010-04-09 0 0 1
2010-04-14 0 0 1
2011-01-19 0 0 1
2011-01-24 0 0 1
2011-01-27 0 0 1
If there were multiple different Categories with the same Date they would be on the same row...
如果有多个具有相同日期的不同类别,它们将位于同一行...
回答by cwharland
For the month and year break out I often add additional columns to the data frame that break out the dates into each piece:
对于月份和年份的划分,我经常在数据框中添加额外的列,将日期划分为每一部分:
df['year'] = [t.year for t in df.DateOccurence]
df['month'] = [t.month for t in df.DateOccurence]
df['day'] = [t.day for t in df.DateOccurence]
It adds space complexity (adding columns to the df) but is less time complex (less processing on groupby) than a datetime index but it's really up to you. datetime index is the more pandas way to do things.
它增加了空间复杂性(向 df 添加列),但时间复杂度(对 groupby 的处理更少)比日期时间索引要少,但这真的取决于你。日期时间索引是更Pandas的做事方式。
After breaking out by year, month, day you can do any groupby you need.
按年、月、日分组后,您可以进行任何您需要的分组。
df.groupby['year','month'].Category.apply(pd.value_counts)
To get months across multiple years:
要获得跨多年的月份:
df.groupby['month'].Category.apply(pd.value_counts)
Or in Andy Hayden's datetime index
或者在安迪海登的日期时间索引中
df.groupby[di.month].Category.apply(pd.value_counts)
You can simply pick which method fits your needs better.
您可以简单地选择哪种方法更适合您的需求。

