Pandas 数据框按多列分组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46408971/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas dataframe group by multiple columns
提问by DougKruger
Given a dataframe with two datetime columns A
and B
and a numeric column C
, how to group by month
of both A
and B
and sum(C)
i.e.
鉴于有两个日期时间列的数据帧A
,并B
和数字列C
,如何组由month
两个A
与B
和sum(C)
即
In [1]: df
Out[1]:
A B C
0 2013-01-01 2013-01-01 0.282863
1 2013-01-02 2013-01-01 0.173215
2 2013-02-03 2013-02-04 2.104569
3 2013-02-09 2013-04-15 0.706771
4 2013-03-05 2013-08-01 0.567020
5 2013-03-06 2013-04-01 0.113648
回答by YOBEN_S
By using groupby
通过使用 groupby
df.groupby([df.A.dt.month,df.B.dt.month]).C.sum()
Out[954]:
A B
1 1 0.456078
2 2 2.104569
4 0.706771
3 4 0.113648
8 0.567020
Name: C, dtype: float64
Note: By using this , make sure A and B are datetime format If not , do following code before groupby
注意:通过使用这个,确保A和B是日期时间格式如果不是,请先执行以下代码 groupby
df.A=pd.to_datetime(df.A)
df.B=pd.to_datetime(df.B)
回答by Yale Newman
I recently just read about a new function that makes grouping by dates super easy.
我最近刚读到一个新功能,它使按日期分组变得非常容易。
df.A=pd.to_datetime(df.A)
df.B=pd.to_datetime(df.B)
df.groupby([pd.Grouper(key='A', freq='M'), pd.Grouper(key='B', freq='M')])['C'].sum()
The number of options this opens up makes it worth looking into:
这打开的选项数量值得研究:
Source: http://pbpython.com/pandas-grouper-agg.html
来源:http: //pbpython.com/pandas-grouper-agg.html
Different Date aliases: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
不同的日期别名:http: //pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
回答by A.Kot
df['month_A'] = [i.month for i in pd.to_datetime(df.A)]
df['month_B'] = [i.month for i in pd.to_datetime(df.B)]
df.groupby(['month_A', 'month_B']).sum()
回答by Rockbar
If you combine with following, you will get back the result with the respective values in A and B column
如果您结合以下内容,您将使用 A 和 B 列中的相应值返回结果
idsum = df.groupby([df.A.dt.month,df.B.dt.month])["C"].transform(sum) == df["C"]
df=df[idsum]