Python 如何在 Pandas 中按数据框分组并保留列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31569549/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:14:20  来源:igfitidea点击:

How to GroupBy a Dataframe in Pandas and keep Columns

pythonpandas

提问by Adrian Ribao

given a dataframe that logs uses of some books like this:

给定一个数据框,它记录了一些书籍的使用情况,如下所示:

Name   Type   ID
Book1  ebook  1
Book2  paper  2
Book3  paper  3
Book1  ebook  1
Book2  paper  2

I need to get the count of all the books, keeping the other columns and get this:

我需要计算所有书籍的数量,保留其他列并得到这个:

Name   Type   ID    Count
Book1  ebook  1     2
Book2  paper  2     2
Book3  paper  3     1

How can this be done?

如何才能做到这一点?

Thanks!

谢谢!

采纳答案by EdChum

You want the following:

您需要以下内容:

In [20]:
df.groupby(['Name','Type','ID']).count().reset_index()

Out[20]:
    Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

In your case the 'Name', 'Type' and 'ID' cols match in values so we can groupbyon these, call countand then reset_index.

在您的情况下,“名称”、“类型”和“ID”列的值匹配,因此我们可以groupby对这些列进行调用count,然后调用reset_index.

An alternative approach would be to add the 'Count' column using transformand then call drop_duplicates:

另一种方法是使用添加“计数”列transform,然后调用drop_duplicates

In [25]:
df['Count'] = df.groupby(['Name'])['ID'].transform('count')
df.drop_duplicates()

Out[25]:
    Name   Type  ID  Count
0  Book1  ebook   1      2
1  Book2  paper   2      2
2  Book3  paper   3      1

回答by jpobst

I think as_index=False should do the trick.

我认为 as_index=False 应该可以解决问题。

df.groupby(['Name','Type','ID'], as_index=False).count()

回答by NeStack

If you have many columns in a df it makes sense to use df.groupby(['foo']).agg(...), see here. The .agg()function allows you to choose what to do with the columns you don't want to apply operations on. If you just want to keep them, use .agg({'col1': 'first', 'col2': 'first', ...}. Instead of 'first', you can also apply 'sum', 'mean'and others.

如果 df 中有很多列,使用它是有意义的df.groupby(['foo']).agg(...),请参见此处。该.agg()函数允许您选择如何处理不想对其应用操作的列。如果您只想保留它们,请使用.agg({'col1': 'first', 'col2': 'first', ...}. 相反的'first',你也可以申请'sum''mean'和其他人。