Python 如何在 Pandas 中按数据框分组并保留列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31569549/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to GroupBy a Dataframe in Pandas and keep Columns
提问by Adrian Ribao
given a dataframe that logs uses of some books like this:
给定一个数据框,它记录了一些书籍的使用情况,如下所示:
Name Type ID
Book1 ebook 1
Book2 paper 2
Book3 paper 3
Book1 ebook 1
Book2 paper 2
I need to get the count of all the books, keeping the other columns and get this:
我需要计算所有书籍的数量,保留其他列并得到这个:
Name Type ID Count
Book1 ebook 1 2
Book2 paper 2 2
Book3 paper 3 1
How can this be done?
如何才能做到这一点?
Thanks!
谢谢!
采纳答案by EdChum
You want the following:
您需要以下内容:
In [20]:
df.groupby(['Name','Type','ID']).count().reset_index()
Out[20]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupbyon these, call countand then reset_index.
在您的情况下,“名称”、“类型”和“ID”列的值匹配,因此我们可以groupby对这些列进行调用count,然后调用reset_index.
An alternative approach would be to add the 'Count' column using transformand then call drop_duplicates:
另一种方法是使用添加“计数”列transform,然后调用drop_duplicates:
In [25]:
df['Count'] = df.groupby(['Name'])['ID'].transform('count')
df.drop_duplicates()
Out[25]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
回答by jpobst
I think as_index=False should do the trick.
我认为 as_index=False 应该可以解决问题。
df.groupby(['Name','Type','ID'], as_index=False).count()
回答by NeStack
If you have many columns in a df it makes sense to use df.groupby(['foo']).agg(...), see here. The .agg()function allows you to choose what to do with the columns you don't want to apply operations on. If you just want to keep them, use .agg({'col1': 'first', 'col2': 'first', ...}. Instead of 'first', you can also apply 'sum', 'mean'and others.
如果 df 中有很多列,使用它是有意义的df.groupby(['foo']).agg(...),请参见此处。该.agg()函数允许您选择如何处理不想对其应用操作的列。如果您只想保留它们,请使用.agg({'col1': 'first', 'col2': 'first', ...}. 相反的'first',你也可以申请'sum','mean'和其他人。

