pandas 如何根据熊猫中的两列进行分组?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43222137/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:20:36  来源:igfitidea点击:

How to groupby based on two columns in pandas?

pythonpandasdataframegroup-bypandas-groupby

提问by ARASH

A similar question might have been asked before, but I couldn't find the exact one fitting to my problem. I want to group by a dataframe based on two columns. For exmaple to make this

之前可能有人问过类似的问题,但我找不到适合我的问题的确切问题。我想根据基于两列的数据框进行分组。例如,使这个

id product quantity
1  A       2
1  A       3
1  B       2
2  A       1
2  B       1
3  B       2
3  B       1

Into this:

进入这个:

id product quantity
1  A       5
1  B       2
2  A       1
2  B       1
3  B       3

Meaning that summation on "quantity" column for same "id" and same "product".

这意味着相同“id”和相同“产品”的“数量”列的总和。

回答by jezrael

You need groupbywith parameter as_index=Falsefor return DataFrameand aggregating mean:

您需要返回和聚合groupby参数:as_index=FalseDataFramemean

df = df.groupby(['id','product'], as_index=False)['quantity'].sum()
print (df)
   id product  quantity
0   1       A         5
1   1       B         2
2   2       A         1
3   2       B         1
4   3       B         3

Or add reset_index:

或添加reset_index

df = df.groupby(['id','product'])['quantity'].sum().reset_index()
print (df)
   id product  quantity
0   1       A         5
1   1       B         2
2   2       A         1
3   2       B         1
4   3       B         3

回答by piRSquared

You can use pivot_tablewith aggfunc='sum'

你可以用pivot_tableaggfunc='sum'

df.pivot_table('quantity', ['id', 'product'], aggfunc='sum').reset_index()

   id product  quantity
0   1       A         5
1   1       B         2
2   2       A         1
3   2       B         1
4   3       B         3

回答by Shubham R

You can use groupbyand aggregatefunction

您可以使用groupbyaggregate功能

import pandas as pd
df = pd.DataFrame({
    'id': [1,1,1,2,2,3,3],
    'product': ['A','A','B','A','B','B','B'],
    'quantity': [2,3,2,1,1,2,1]
})

print df

   id  product  quantity
0   1     A      2
1   1     A      3
2   1     B      2
3   2     A      1
4   2     B      1
5   3     B      2
6   3     B      1


df = df.groupby(['id','product']).agg({'quantity':'sum'}).reset_index()
print df

    id  product  quantity
0   1     A       5
1   1     B       2
2   2     A       1
3   2     B       1
4   3     B       3