Python Pandas 使用 groupby 中的计数创建新列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29836477/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas create new column with count from groupby
提问by GNMO11
I have a df that looks like the following:
我有一个如下所示的 df:
id item color
01 truck red
02 truck red
03 car black
04 truck blue
05 car black
I am trying to create a df that looks like this:
我正在尝试创建一个如下所示的 df:
item color count
truck red 2
truck blue 1
car black 2
I have tried
我试过了
df["count"] = df.groupby("item")["color"].transform('count')
But it is not quite what I am searching for.
但这并不是我正在寻找的。
Any guidance is appreciated
任何指导表示赞赏
采纳答案by Andy Hayden
That's not a new column, that's a new DataFrame:
这不是一个新列,而是一个新的 DataFrame:
In [11]: df.groupby(["item", "color"]).count()
Out[11]:
id
item color
car black 2
truck blue 1
red 2
To get the result you want is to use reset_index
:
要获得您想要的结果是使用reset_index
:
In [12]: df.groupby(["item", "color"])["id"].count().reset_index(name="count")
Out[12]:
item color count
0 car black 2
1 truck blue 1
2 truck red 2
To get a "new column" you could use transform:
要获得“新列”,您可以使用转换:
In [13]: df.groupby(["item", "color"])["id"].transform("count")
Out[13]:
0 2
1 2
2 2
3 1
4 2
dtype: int64
I recommend reading the split-apply-combine section of the docs.
我建议阅读文档的split-apply-combine 部分。
回答by Jadon Manilall
Another possible way to achieve the desired output would be to use Named Aggregation. Which will allow you to specify the name and respective aggregation function for the desired output columns.
实现所需输出的另一种可能方法是使用Named Aggregation。这将允许您为所需的输出列指定名称和相应的聚合函数。
Named aggregation
(New in version 0.25.0.)
To support column-specific aggregation with control over the output column names, pandas accepts the special syntax in
GroupBy.agg()
, known as “named aggregation”, where:
The keywords are the output column names
The values are tuples whose first element is the column to select and the second element is the aggregation to apply to that column. Pandas provides the
pandas.NamedAgg
named tuple with the fields['column','aggfunc']
to make it clearer what the arguments are. As usual, the aggregation can be a callable or a string alias.
命名聚合
(0.25.0 版中的新功能。)
为了通过控制输出列名称来支持特定于列的聚合,pandas 接受 中的特殊语法
GroupBy.agg()
,称为“命名聚合”,其中:
关键字是输出列名称
这些值是元组,其第一个元素是要选择的列,第二个元素是要应用于该列的聚合。Pandas 为
pandas.NamedAgg
命名元组提供了字段['column','aggfunc']
,以便更清楚地说明参数是什么。像往常一样,聚合可以是可调用的或字符串别名。
So to get the desired output - you could try something like...
因此,要获得所需的输出 - 您可以尝试类似...
import pandas as pd
# Setup
df = pd.DataFrame([
{
"item":"truck",
"color":"red"
},
{
"item":"truck",
"color":"red"
},
{
"item":"car",
"color":"black"
},
{
"item":"truck",
"color":"blue"
},
{
"item":"car",
"color":"black"
}
])
df_grouped = df.groupby(["item", "color"]).agg(
count_col=pd.NamedAgg(column="color", aggfunc="count")
)
print(df_grouped)
Which produces the following output:
产生以下输出:
count_col
item color
car black 2
truck blue 1
red 2
回答by Adrian Keister
Here is another option:
这是另一种选择:
import numpy as np
df['Counts'] = np.zeros(len(df))
grp_df = df.groupby(['item', 'color']).count()
which results in
这导致
Counts
item color
car black 2
truck blue 1
red 2