Python 熊猫将列添加到 groupby 数据框

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/37189878/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 19:00:41  来源:igfitidea点击:

pandas add column to groupby dataframe

pythonpandas

提问by Fabio Lamanna

I have this simple dataframe df:

我有这个简单的数据框df

df = pd.DataFrame({'c':[1,1,1,2,2,2,2],'type':['m','n','o','m','m','n','n']})

my goal is to count values of typefor each c, and then add a column with the size of c. So starting with:

我的目标是计算type每个 的值c,然后添加一个大小为c. 所以开始:

In [27]: g = df.groupby('c')['type'].value_counts().reset_index(name='t')

In [28]: g
Out[28]: 
   c type  t
0  1    m  1
1  1    n  1
2  1    o  1
3  2    m  2
4  2    n  2

the first problem is solved. Then I can also:

第一个问题解决了。然后我还可以:

In [29]: a = df.groupby('c').size().reset_index(name='size')

In [30]: a
Out[30]: 
   c  size
0  1     3
1  2     4

How can I add the sizecolumn directly to the first dataframe? So far I used mapas:

如何将size列直接添加到第一个数据帧?到目前为止,我用作map

In [31]: a.index = a['c']

In [32]: g['size'] = g['c'].map(a['size'])

In [33]: g
Out[33]: 
   c type  t  size
0  1    m  1     3
1  1    n  1     3
2  1    o  1     3
3  2    m  2     4
4  2    n  2     4

which works, but is there a more straightforward way to do this?

哪个有效,但有没有更直接的方法来做到这一点?

采纳答案by EdChum

Use transformto add a column back to the orig df from a groupbyaggregation, transformreturns a Serieswith its index aligned to the orig df:

使用transform从添加一列回原稿DFgroupby聚集,transform返回Series其索引对准原稿DF:

In [123]:
g = df.groupby('c')['type'].value_counts().reset_index(name='t')
g['size'] = df.groupby('c')['type'].transform('size')
g

Out[123]:
   c type  t  size
0  1    m  1     3
1  1    n  1     3
2  1    o  1     3
3  2    m  2     4
4  2    n  2     4

回答by jezrael

Another solution with transformlen:

另一个解决方案:transformlen

df['size'] = df.groupby('c')['type'].transform(len)
print df
   c type size
0  1    m    3
1  1    n    3
2  1    o    3
3  2    m    4
4  2    m    4
5  2    n    4
6  2    n    4

Another solution with Series.mapand Series.value_counts:

使用Series.map和的另一种解决方案Series.value_counts

df['size'] = df['c'].map(df['c'].value_counts())
print (df)
   c type  size
0  1    m     3
1  1    n     3
2  1    o     3
3  2    m     4
4  2    m     4
5  2    n     4
6  2    n     4