Python Pandas 使用 groupby 中的计数创建新列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29836477/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 05:04:35  来源:igfitidea点击:

Pandas create new column with count from groupby

pythonpandas

提问by GNMO11

I have a df that looks like the following:

我有一个如下所示的 df:

id        item        color
01        truck       red
02        truck       red
03        car         black
04        truck       blue
05        car         black

I am trying to create a df that looks like this:

我正在尝试创建一个如下所示的 df:

item      color       count
truck     red          2
truck     blue         1
car       black        2

I have tried

我试过了

df["count"] = df.groupby("item")["color"].transform('count')

But it is not quite what I am searching for.

但这并不是我正在寻找的。

Any guidance is appreciated

任何指导表示赞赏

采纳答案by Andy Hayden

That's not a new column, that's a new DataFrame:

这不是一个新列,而是一个新的 DataFrame:

In [11]: df.groupby(["item", "color"]).count()
Out[11]:
             id
item  color
car   black   2
truck blue    1
      red     2

To get the result you want is to use reset_index:

要获得您想要的结果是使用reset_index

In [12]: df.groupby(["item", "color"])["id"].count().reset_index(name="count")
Out[12]:
    item  color  count
0    car  black      2
1  truck   blue      1
2  truck    red      2

To get a "new column" you could use transform:

要获得“新列”,您可以使用转换:

In [13]: df.groupby(["item", "color"])["id"].transform("count")
Out[13]:
0    2
1    2
2    2
3    1
4    2
dtype: int64

I recommend reading the split-apply-combine section of the docs.

我建议阅读文档split-apply-combine 部分

回答by Jadon Manilall

Another possible way to achieve the desired output would be to use Named Aggregation. Which will allow you to specify the name and respective aggregation function for the desired output columns.

实现所需输出的另一种可能方法是使用Named Aggregation。这将允许您为所需的输出列指定名称和相应的聚合函数。

Named aggregation

(New in version 0.25.0.)

To support column-specific aggregation with control over the output column names, pandas accepts the special syntax in GroupBy.agg(), known as “named aggregation”, where:

  • The keywords are the output column names

  • The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. Pandas provides the pandas.NamedAggnamed tuple with the fields ['column','aggfunc']to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias.

命名聚合

0.25.0 版中的新功能。

为了通过控制输出列名称来支持特定于列的聚合,pandas 接受 中的特殊语法GroupBy.agg(),称为“命名聚合”,其中:

  • 关键字是输出列名称

  • 这些值是元组,其第一个元素是要选择的列,第二个元素是要应用于该列的聚合。Pandas 为pandas.NamedAgg命名元组提供了字段['column','aggfunc'],以便更清楚地说明参数是什么。像往常一样,聚合可以是可调用的或字符串别名。

So to get the desired output - you could try something like...

因此,要获得所需的输出 - 您可以尝试类似...

import pandas as pd
# Setup
df = pd.DataFrame([
    {
        "item":"truck",
        "color":"red"
    },
    {
        "item":"truck",
        "color":"red"
    },
    {
        "item":"car",
        "color":"black"
    },
    {
        "item":"truck",
        "color":"blue"
    },
    {
        "item":"car",
        "color":"black"
    }
])

df_grouped = df.groupby(["item", "color"]).agg(
    count_col=pd.NamedAgg(column="color", aggfunc="count")
)
print(df_grouped)

Which produces the following output:

产生以下输出:

             count_col
item  color
car   black          2
truck blue           1
      red            2

回答by Adrian Keister

Here is another option:

这是另一种选择:

import numpy as np
df['Counts'] = np.zeros(len(df))
grp_df = df.groupby(['item', 'color']).count()

which results in

这导致

             Counts
item  color        
car   black       2
truck blue        1
      red         2