pandas 熊猫分组日期

Question

提问by ArtDijk

I have a DataFrame with events. One or more events can occur at a date (so the date can't be an index). The date range is several years. I want to groupby years and months and have a count of the Category values. Thnx

我有一个带有事件的 DataFrame。一个或多个事件可能发生在某个日期（因此该日期不能作为索引）。日期范围是几年。我想按年份和月份分组并计算类别值。谢谢

in [12]: df = pd.read_excel('Pandas_Test.xls', 'sheet1')
In [13]: df
Out[13]:
    EventRefNr     DateOccurence      Type Category
0        86596    2010-01-02 00:00:00     3    Small
1        86779    2010-01-09 00:00:00    13   Medium
2        86780    2010-02-10 00:00:00     6    Small
3        86781    2010-02-09 00:00:00    17    Small
4        86898    2010-02-10 00:00:00     6    Small
5        86898    2010-02-11 00:00:00     6    Small
6        86902    2010-02-17 00:00:00     9    Small
7        86908    2010-02-19 00:00:00     3   Medium
8        86908    2010-03-05 00:00:00     3   Medium
9        86909    2010-03-06 00:00:00     8    Small
10       86930    2010-03-12 00:00:00    29    Small
11       86934    2010-03-16 00:00:00     9    Small
12       86940    2010-04-08 00:00:00     9     High
13       86941    2010-04-09 00:00:00    17    Small
14       86946    2010-04-14 00:00:00    10    Small
15       86950    2011-01-19 00:00:00    12    Small
16       86956    2011-01-24 00:00:00    13    Small
17       86959    2011-01-27 00:00:00    17    Small

I tried:

我试过：

df.groupby(df['DateOccurence'])

Answer 1

采纳答案by Andy Hayden

You can apply value_countsto the SeriesGroupby (for the column):

您可以将value_counts应用于 SeriesGroupby（对于列）：

In [11]: g = df.groupby('DateOccurence')

In [12]: g.Category.apply(pd.value_counts)
Out[12]: 
DateOccurence        
2010-01-02     Small     1
2010-01-09     Medium    1
2010-02-09     Small     1
2010-02-10     Small     2
2010-02-11     Small     1
2010-02-17     Small     1
2010-02-19     Medium    1
2010-03-05     Medium    1
2010-03-06     Small     1
2010-03-12     Small     1
2010-03-16     Small     1
2010-04-08     High      1
2010-04-09     Small     1
2010-04-14     Small     1
2011-01-19     Small     1
2011-01-24     Small     1
2011-01-27     Small     1
dtype: int64

I actually hoped this to return the following DataFrame, but you need to unstackit:

其实我希望这个返回以下数据帧，但你需要拆散它：

In [13]: g.Category.apply(pd.value_counts).unstack(-1).fillna(0)
Out[13]: 
               High  Medium  Small
DateOccurence                     
2010-01-02        0       0      1
2010-01-09        0       1      0
2010-02-09        0       0      1
2010-02-10        0       0      2
2010-02-11        0       0      1
2010-02-17        0       0      1
2010-02-19        0       1      0
2010-03-05        0       1      0
2010-03-06        0       0      1
2010-03-12        0       0      1
2010-03-16        0       0      1
2010-04-08        1       0      0
2010-04-09        0       0      1
2010-04-14        0       0      1
2011-01-19        0       0      1
2011-01-24        0       0      1
2011-01-27        0       0      1

If there were multiple different Categories with the same Date they would be on the same row...

如果有多个具有相同日期的不同类别，它们将位于同一行...

Answer 2

回答by cwharland

For the month and year break out I often add additional columns to the data frame that break out the dates into each piece:

对于月份和年份的划分，我经常在数据框中添加额外的列，将日期划分为每一部分：

df['year'] = [t.year for t in df.DateOccurence]
df['month'] = [t.month for t in df.DateOccurence]
df['day'] = [t.day for t in df.DateOccurence]

It adds space complexity (adding columns to the df) but is less time complex (less processing on groupby) than a datetime index but it's really up to you. datetime index is the more pandas way to do things.

它增加了空间复杂性（向 df 添加列），但时间复杂度（对 groupby 的处理更少）比日期时间索引要少，但这真的取决于你。日期时间索引是更Pandas的做事方式。

After breaking out by year, month, day you can do any groupby you need.

按年、月、日分组后，您可以进行任何您需要的分组。

df.groupby['year','month'].Category.apply(pd.value_counts)

To get months across multiple years:

要获得跨多年的月份：

df.groupby['month'].Category.apply(pd.value_counts)

Or in Andy Hayden's datetime index

或者在安迪海登的日期时间索引中

df.groupby[di.month].Category.apply(pd.value_counts)

You can simply pick which method fits your needs better.

您可以简单地选择哪种方法更适合您的需求。

pandas 熊猫分组日期

提问by ArtDijk

采纳答案by Andy Hayden

回答by cwharland

相关推荐

最近更新

标签

pandas 熊猫分组日期

提问by ArtDijk

采纳答案by Andy Hayden

回答by cwharland

相关推荐

pandas 如何将熊猫数据帧行快速转换为ordereddict

比较 2 个不同的 Pandas 数据帧的 2 列，如果相同，则在 Python 中将 1 插入另一个

pandas 如何使用熊猫按周对数据透视表结果进行分组？

在 Pandas 中将 MultiIndex 的级别重新索引为任意顺序

相关推荐

最近更新

标签