Python Pandas 使用 groupby 中的计数创建新列

Question

提问by GNMO11

I have a df that looks like the following:

我有一个如下所示的 df：

id        item        color
01        truck       red
02        truck       red
03        car         black
04        truck       blue
05        car         black

I am trying to create a df that looks like this:

我正在尝试创建一个如下所示的 df：

item      color       count
truck     red          2
truck     blue         1
car       black        2

I have tried

我试过了

df["count"] = df.groupby("item")["color"].transform('count')

But it is not quite what I am searching for.

但这并不是我正在寻找的。

Any guidance is appreciated

任何指导表示赞赏

Answer 1

采纳答案by Andy Hayden

That's not a new column, that's a new DataFrame:

这不是一个新列，而是一个新的 DataFrame：

In [11]: df.groupby(["item", "color"]).count()
Out[11]:
             id
item  color
car   black   2
truck blue    1
      red     2

To get the result you want is to use reset_index:

要获得您想要的结果是使用reset_index：

In [12]: df.groupby(["item", "color"])["id"].count().reset_index(name="count")
Out[12]:
    item  color  count
0    car  black      2
1  truck   blue      1
2  truck    red      2

To get a "new column" you could use transform:

要获得“新列”，您可以使用转换：

In [13]: df.groupby(["item", "color"])["id"].transform("count")
Out[13]:
0    2
1    2
2    2
3    1
4    2
dtype: int64

I recommend reading the split-apply-combine section of the docs.

我建议阅读文档的split-apply-combine 部分。

Answer 2

回答by Jadon Manilall

Another possible way to achieve the desired output would be to use Named Aggregation. Which will allow you to specify the name and respective aggregation function for the desired output columns.

实现所需输出的另一种可能方法是使用Named Aggregation。这将允许您为所需的输出列指定名称和相应的聚合函数。

Named aggregation
(New in version 0.25.0.)
To support column-specific aggregation with control over the output column names, pandas accepts the special syntax in GroupBy.agg(), known as “named aggregation”, where:
The keywords are the output column names
The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. Pandas provides the pandas.NamedAggnamed tuple with the fields ['column','aggfunc']to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias.

命名聚合
（0.25.0 版中的新功能。）
为了通过控制输出列名称来支持特定于列的聚合，pandas 接受中的特殊语法GroupBy.agg()，称为“命名聚合”，其中：
关键字是输出列名称
这些值是元组，其第一个元素是要选择的列，第二个元素是要应用于该列的聚合。Pandas 为pandas.NamedAgg命名元组提供了字段['column','aggfunc']，以便更清楚地说明参数是什么。像往常一样，聚合可以是可调用的或字符串别名。

So to get the desired output - you could try something like...

因此，要获得所需的输出 - 您可以尝试类似...

import pandas as pd
# Setup
df = pd.DataFrame([
    {
        "item":"truck",
        "color":"red"
    },
    {
        "item":"truck",
        "color":"red"
    },
    {
        "item":"car",
        "color":"black"
    },
    {
        "item":"truck",
        "color":"blue"
    },
    {
        "item":"car",
        "color":"black"
    }
])

df_grouped = df.groupby(["item", "color"]).agg(
    count_col=pd.NamedAgg(column="color", aggfunc="count")
)
print(df_grouped)

Which produces the following output:

产生以下输出：

             count_col
item  color
car   black          2
truck blue           1
      red            2

Answer 3

回答by Adrian Keister

Here is another option:

这是另一种选择：

import numpy as np
df['Counts'] = np.zeros(len(df))
grp_df = df.groupby(['item', 'color']).count()

which results in

这导致

             Counts
item  color        
car   black       2
truck blue        1
      red         2

Python Pandas 使用 groupby 中的计数创建新列

提问by GNMO11

采纳答案by Andy Hayden

回答by Jadon Manilall

Named aggregation

命名聚合

回答by Adrian Keister

相关推荐

最近更新

标签

Python Pandas 使用 groupby 中的计数创建新列

提问by GNMO11

采纳答案by Andy Hayden

回答by Jadon Manilall

Named aggregation

命名聚合

回答by Adrian Keister

相关推荐

Python 原始响应的大小（以字节为单位）

我们如何在 Python openpyxl 包中使用 iter_rows()？

Python 在 Pillow 中保存动画 GIF

Python pandas - 从字典向数据框添加新列

相关推荐

最近更新

标签