Python 熊猫按月和年分组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26646191/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:48:41  来源:igfitidea点击:

Pandas groupby month and year

pythonpandas

提问by darkpool

I have the following dataframe:

我有以下数据框:

Date        abc    xyz
01-Jun-13   100    200
03-Jun-13   -20    50
15-Aug-13   40     -5
20-Jan-14   25     15
21-Feb-14   60     80

I need to group the data by year and month. ie: Group by Jan 2013, Feb 2013, Mar 2013 etc... I will be using the newly grouped data to create a plot showing abc vs xyz per year/month.

我需要按年和月对数据进行分组。即:按 2013 年 1 月、2013 年 2 月、2013 年 3 月等分组...我将使用新分组的数据创建一个图表,显示每年/每月的 abc 与 xyz。

I've tried various combinations of groupby and sum but just can't seem to get anything to work.

我已经尝试了 groupby 和 sum 的各种组合,但似乎无法得到任何工作。

Thank you for any assistance.

感谢您提供任何帮助。

采纳答案by Andy Hayden

You can use either resample or Grouper(which resamples under the hood).

您可以使用重新采样或Grouper(在引擎盖下重新采样)。

First make sure that the datetime column is actually of datetimes (hit it with pd.to_datetime). It's easier if it's a DatetimeIndex:

首先确保日期时间列实际上是日期时间(用 打它pd.to_datetime)。如果它是 DatetimeIndex 则更容易:

In [11]: df1
Out[11]:
            abc  xyz
Date
2013-06-01  100  200
2013-06-03  -20   50
2013-08-15   40   -5
2014-01-20   25   15
2014-02-21   60   80

In [12]: g = df1.groupby(pd.Grouper(freq="M")) ?# DataFrameGroupBy (grouped by Month)

In [13]: g.sum()
Out[13]:
            abc  xyz
Date
2013-06-30   80  250
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80

In [14]: df1.resample("M", how='sum') ?# the same
Out[14]:
            abc  xyz
Date
2013-06-30   40  125
2013-07-31  NaN  NaN
2013-08-31   40   -5
2013-09-30  NaN  NaN
2013-10-31  NaN  NaN
2013-11-30  NaN  NaN
2013-12-31  NaN  NaN
2014-01-31   25   15
2014-02-28   60   80

Note: Previously pd.Grouper(freq="M")was written as pd.TimeGrouper("M"). The latter is now deprecated since 0.21.

注意:以前pd.Grouper(freq="M")写为pd.TimeGrouper("M"). 后者自 0.21 起已被弃用。



I had thought the following would work, but it doesn't (due to as_indexnot being respected? I'm not sure.). I'm including this for interest's sake.

我曾认为以下内容会起作用,但它不会(由于as_index不受尊重?我不确定。)。为了利益,我将其包括在内。

If it's a column (it has to be a datetime64 column! as I say, hit it with to_datetime), you can use the PeriodIndex:

如果它是一列(它必须是一个 datetime64 列!正如我所说,用 打它to_datetime),你可以使用 PeriodIndex:

In [21]: df
Out[21]:
        Date  abc  xyz
0 2013-06-01  100  200
1 2013-06-03  -20   50
2 2013-08-15   40   -5
3 2014-01-20   25   15
4 2014-02-21   60   80

In [22]: pd.DatetimeIndex(df.Date).to_period("M")  # old way
Out[22]:
<class 'pandas.tseries.period.PeriodIndex'>
[2013-06, ..., 2014-02]
Length: 5, Freq: M

In [23]: per = df.Date.dt.to_period("M")  # new way to get the same

In [24]: g = df.groupby(per)

In [25]: g.sum()  # dang not quite what we want (doesn't fill in the gaps)
Out[25]:
         abc  xyz
2013-06   80  250
2013-08   40   -5
2014-01   25   15
2014-02   60   80

To get the desired result we have to reindex...

为了得到想要的结果,我们必须重新索引...

回答by darkpool

There are different ways to do that.

有不同的方法可以做到这一点。

  • I created the data frame to showcase the different techniques to filter your data.
  • 我创建了数据框来展示过滤数据的不同技术。
df = pd.DataFrame({'Date':['01-Jun-13','03-Jun-13', '15-Aug-13', '20-Jan-14', '21-Feb-14'],

'abc':[100,-20,40,25,60],'xyz':[200,50,-5,15,80] })

df = pd.DataFrame({'Date':['01-Jun-13','03-Jun-13', '15-Aug-13', '20-Jan-14', '21-Feb-14'],

'abc':[100,-20,40,25,60],'xyz':[200,50,-5,15,80] })

  • I separated months/year/day and seperated month-year as you explained.
  • 正如您所解释的,我将月/年/日分开,并将月年分开。
def getMonth(s):
  return s.split("-")[1]

def getDay(s):
  return s.split("-")[0]

def getYear(s):
  return s.split("-")[2]

def getYearMonth(s):
  return s.split("-")[1]+"-"+s.split("-")[2]
def getMonth(s):
  return s.split("-")[1]

def getDay(s):
  return s.split("-")[0]

def getYear(s):
  return s.split("-")[2]

def getYearMonth(s):
  return s.split("-")[1]+"-"+s.split("-")[2]
  • I created new columns: year, month, dayand 'yearMonth'. In your case, you need one of both. You can group using two columns 'year','month'or using one column yearMonth
  • 我创建的新列:yearmonthday和“ yearMonth”。在您的情况下,您需要两者之一。您可以使用两列'year','month'或使用一列进行分组yearMonth
df['year']= df['Date'].apply(lambda x: getYear(x))
df['month']= df['Date'].apply(lambda x: getMonth(x))
df['day']= df['Date'].apply(lambda x: getDay(x))
df['YearMonth']= df['Date'].apply(lambda x: getYearMonth(x))
df['year']= df['Date'].apply(lambda x: getYear(x))
df['month']= df['Date'].apply(lambda x: getMonth(x))
df['day']= df['Date'].apply(lambda x: getDay(x))
df['YearMonth']= df['Date'].apply(lambda x: getYearMonth(x))

Output:

输出:

        Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13
2  15-Aug-13   40   -5   13   Aug  15    Aug-13
3  20-Jan-14   25   15   14   Jan  20    Jan-14
4  21-Feb-14   60   80   14   Feb  21    Feb-14
  • You can go through the different groups in groupby(..) items.
  • 您可以浏览 groupby(..) 项目中的不同组。

In this case, we are grouping by two columns:

在这种情况下,我们按两列分组:

for key,g in df.groupby(['year','month']):
    print key,g
for key,g in df.groupby(['year','month']):
    print key,g

Output:

输出:

('13', 'Jun')         Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13
('13', 'Aug')         Date  abc  xyz year month day YearMonth
2  15-Aug-13   40   -5   13   Aug  15    Aug-13
('14', 'Jan')         Date  abc  xyz year month day YearMonth
3  20-Jan-14   25   15   14   Jan  20    Jan-14
('14', 'Feb')         Date  abc  xyz year month day YearMonth

In this case, we are grouping by one column:

在这种情况下,我们按一列分组:

for key,g in df.groupby(['YearMonth']):
    print key,g
for key,g in df.groupby(['YearMonth']):
    print key,g

Output:

输出:

Jun-13         Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13
Aug-13         Date  abc  xyz year month day YearMonth
2  15-Aug-13   40   -5   13   Aug  15    Aug-13
Jan-14         Date  abc  xyz year month day YearMonth
3  20-Jan-14   25   15   14   Jan  20    Jan-14
Feb-14         Date  abc  xyz year month day YearMonth
4  21-Feb-14   60   80   14   Feb  21    Feb-14
  • In case you wanna access to specific item, you can use get_group
  • 如果您想访问特定项目,您可以使用 get_group

print df.groupby(['YearMonth']).get_group('Jun-13')

打印 df.groupby(['YearMonth']).get_group('Jun-13')

Output:

输出:

        Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13
  • Similar to get_group. This hack would help to filter values and get the grouped values.
  • 类似于get_group。此 hack 将有助于过滤值并获取分组值。

This also would give the same result.

这也会产生相同的结果。

print df[df['YearMonth']=='Jun-13'] 

Output:

输出:

        Date  abc  xyz year month day YearMonth
0  01-Jun-13  100  200   13   Jun  01    Jun-13
1  03-Jun-13  -20   50   13   Jun  03    Jun-13

You can select list of abcor xyzvalues during Jun-13

您可以在期间选择列表abcxyzJun-13

print df[df['YearMonth']=='Jun-13'].abc.values
print df[df['YearMonth']=='Jun-13'].xyz.values

Output:

输出:

[100 -20]  #abc values
[200  50]  #xyz values

You can use this to go through the dates that you have classified as "year-month" and apply cretiria on it to get related data.

您可以使用它来查看您归类为“年-月”的日期,并对其应用 cretiria 以获取相关数据。

for x in set(df.YearMonth): 
    print df[df['YearMonth']==x].abc.values
    print df[df['YearMonth']==x].xyz.values

I recommend also to check this answeras well.

我也建议检查这个答案

回答by Q-man

Why not keep it simple?!

为什么不保持简单?!

GB=DF.groupby([(DF.index.year),(DF.index.month)]).sum()

giving you,

给你,

print(GB)
        abc  xyz
2013 6   80  250
     8   40   -5
2014 1   25   15
     2   60   80

and then you can plot like asked using,

然后你可以按照要求进行绘图,

GB.plot('abc','xyz',kind='scatter')

回答by tsando

You can also do it by creating a string column with the year and month as follows:

您还可以通过创建一个带有年份和月份的字符串列来实现,如下所示:

df['date'] = df.index
df['year-month'] = df['date'].apply(lambda x: str(x.year) + ' ' + str(x.month))
grouped = df.groupby('year-month')

Howeverthis doesn't preserve the order when you loop over the groups, e.g.

但是,当您遍历组时,这不会保留顺序,例如

for name, group in grouped:
    print(name)

Will give:

会给:

2007 11
2007 12
2008 1
2008 10
2008 11
2008 12
2008 2
2008 3
2008 4
2008 5
2008 6
2008 7
2008 8
2008 9
2009 1
2009 10

So then, if you want to preserve the order, you must do as suggested by @Q-man above:

那么,如果您想保留顺序,则必须按照上面@Q-man 的建议进行操作:

grouped = df.groupby([df.index.year, df.index.month])

This will preserve the order in the above loop:

这将保留上述循环中的顺序:

(2007, 11)
(2007, 12)
(2008, 1)
(2008, 2)
(2008, 3)
(2008, 4)
(2008, 5)
(2008, 6)
(2008, 7)
(2008, 8)
(2008, 9)
(2008, 10)