Python 如何在 Pandas 中按数据框分组并保留列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31569549/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to GroupBy a Dataframe in Pandas and keep Columns
提问by Adrian Ribao
given a dataframe that logs uses of some books like this:
给定一个数据框,它记录了一些书籍的使用情况,如下所示:
Name Type ID
Book1 ebook 1
Book2 paper 2
Book3 paper 3
Book1 ebook 1
Book2 paper 2
I need to get the count of all the books, keeping the other columns and get this:
我需要计算所有书籍的数量,保留其他列并得到这个:
Name Type ID Count
Book1 ebook 1 2
Book2 paper 2 2
Book3 paper 3 1
How can this be done?
如何才能做到这一点?
Thanks!
谢谢!
采纳答案by EdChum
You want the following:
您需要以下内容:
In [20]:
df.groupby(['Name','Type','ID']).count().reset_index()
Out[20]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupby
on these, call count
and then reset_index
.
在您的情况下,“名称”、“类型”和“ID”列的值匹配,因此我们可以groupby
对这些列进行调用count
,然后调用reset_index
.
An alternative approach would be to add the 'Count' column using transform
and then call drop_duplicates
:
另一种方法是使用添加“计数”列transform
,然后调用drop_duplicates
:
In [25]:
df['Count'] = df.groupby(['Name'])['ID'].transform('count')
df.drop_duplicates()
Out[25]:
Name Type ID Count
0 Book1 ebook 1 2
1 Book2 paper 2 2
2 Book3 paper 3 1
回答by jpobst
I think as_index=False should do the trick.
我认为 as_index=False 应该可以解决问题。
df.groupby(['Name','Type','ID'], as_index=False).count()
回答by NeStack
If you have many columns in a df it makes sense to use df.groupby(['foo']).agg(...)
, see here. The .agg()
function allows you to choose what to do with the columns you don't want to apply operations on. If you just want to keep them, use .agg({'col1': 'first', 'col2': 'first', ...}
. Instead of 'first'
, you can also apply 'sum'
, 'mean'
and others.
如果 df 中有很多列,使用它是有意义的df.groupby(['foo']).agg(...)
,请参见此处。该.agg()
函数允许您选择如何处理不想对其应用操作的列。如果您只想保留它们,请使用.agg({'col1': 'first', 'col2': 'first', ...}
. 相反的'first'
,你也可以申请'sum'
,'mean'
和其他人。