Python 熊猫将列添加到 groupby 数据框
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37189878/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas add column to groupby dataframe
提问by Fabio Lamanna
I have this simple dataframe df
:
我有这个简单的数据框df
:
df = pd.DataFrame({'c':[1,1,1,2,2,2,2],'type':['m','n','o','m','m','n','n']})
my goal is to count values of type
for each c
, and then add a column with the size of c
. So starting with:
我的目标是计算type
每个 的值c
,然后添加一个大小为c
. 所以开始:
In [27]: g = df.groupby('c')['type'].value_counts().reset_index(name='t')
In [28]: g
Out[28]:
c type t
0 1 m 1
1 1 n 1
2 1 o 1
3 2 m 2
4 2 n 2
the first problem is solved. Then I can also:
第一个问题解决了。然后我还可以:
In [29]: a = df.groupby('c').size().reset_index(name='size')
In [30]: a
Out[30]:
c size
0 1 3
1 2 4
How can I add the size
column directly to the first dataframe? So far I used map
as:
如何将size
列直接添加到第一个数据帧?到目前为止,我用作map
:
In [31]: a.index = a['c']
In [32]: g['size'] = g['c'].map(a['size'])
In [33]: g
Out[33]:
c type t size
0 1 m 1 3
1 1 n 1 3
2 1 o 1 3
3 2 m 2 4
4 2 n 2 4
which works, but is there a more straightforward way to do this?
哪个有效,但有没有更直接的方法来做到这一点?
采纳答案by EdChum
Use transform
to add a column back to the orig df from a groupby
aggregation, transform
returns a Series
with its index aligned to the orig df:
使用transform
从添加一列回原稿DFgroupby
聚集,transform
返回Series
其索引对准原稿DF:
In [123]:
g = df.groupby('c')['type'].value_counts().reset_index(name='t')
g['size'] = df.groupby('c')['type'].transform('size')
g
Out[123]:
c type t size
0 1 m 1 3
1 1 n 1 3
2 1 o 1 3
3 2 m 2 4
4 2 n 2 4
回答by jezrael
Another solution with transform
len
:
另一个解决方案:transform
len
df['size'] = df.groupby('c')['type'].transform(len)
print df
c type size
0 1 m 3
1 1 n 3
2 1 o 3
3 2 m 4
4 2 m 4
5 2 n 4
6 2 n 4
Another solution with Series.map
and Series.value_counts
:
使用Series.map
和的另一种解决方案Series.value_counts
:
df['size'] = df['c'].map(df['c'].value_counts())
print (df)
c type size
0 1 m 3
1 1 n 3
2 1 o 3
3 2 m 4
4 2 m 4
5 2 n 4
6 2 n 4