python pandas groupby() 结果
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17666075/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python pandas groupby() result
提问by Simon Righley
I have the following python pandas data frame:
我有以下 python pandas 数据框:
df = pd.DataFrame( {
'A': [1,1,1,1,2,2,2,3,3,4,4,4],
'B': [5,5,6,7,5,6,6,7,7,6,7,7],
'C': [1,1,1,1,1,1,1,1,1,1,1,1]
} );
df
A B C
0 1 5 1
1 1 5 1
2 1 6 1
3 1 7 1
4 2 5 1
5 2 6 1
6 2 6 1
7 3 7 1
8 3 7 1
9 4 6 1
10 4 7 1
11 4 7 1
I would like to have another column storing a value of a sum over C values for fixed (both) A and B. That is, something like:
我想要另一列存储固定(两个)A 和 B 的 C 值的总和值。也就是说,类似于:
A B C D
0 1 5 1 2
1 1 5 1 2
2 1 6 1 1
3 1 7 1 1
4 2 5 1 1
5 2 6 1 2
6 2 6 1 2
7 3 7 1 2
8 3 7 1 2
9 4 6 1 1
10 4 7 1 2
11 4 7 1 2
I have tried with pandas groupby
and it kind of works:
我试过熊猫groupby
,它有点工作:
res = {}
for a, group_by_A in df.groupby('A'):
group_by_B = group_by_A.groupby('B', as_index = False)
res[a] = group_by_B['C'].sum()
but I don't know how to 'get' the results from res
into df
in the orderly fashion. Would be very happy with any advice on this. Thank you.
但我不知道如何“得到”从结果res
到df
在有序的方式。对这方面的任何建议都会非常满意。谢谢你。
采纳答案by Andy Hayden
Here's one way (though it feels this should work in one go with an apply, I can't get it).
这是一种方法(虽然感觉这应该与申请一起工作,但我无法理解)。
In [11]: g = df.groupby(['A', 'B'])
In [12]: df1 = df.set_index(['A', 'B'])
The size
groupby function is the one you want, we have to match it to the 'A' and 'B' as the index:
该size
GROUPBY功能是你想要的,我们必须把它匹配到“A”和“B”作为索引:
In [13]: df1['D'] = g.size() # unfortunately this doesn't play nice with as_index=False
# Same would work with g['C'].sum()
In [14]: df1.reset_index()
Out[14]:
A B C D
0 1 5 1 2
1 1 5 1 2
2 1 6 1 1
3 1 7 1 1
4 2 5 1 1
5 2 6 1 2
6 2 6 1 2
7 3 7 1 2
8 3 7 1 2
9 4 6 1 1
10 4 7 1 2
11 4 7 1 2
回答by andrew
You could also do a one liner using merge as follows:
您还可以使用合并做一个单行如下:
df = df.merge(pd.DataFrame({'D':df.groupby(['A', 'B'])['C'].size()}), left_on=['A', 'B'], right_index=True)
回答by DrTRD
You could also do a one liner using transform applied to the groupby:
您还可以使用应用于 groupby 的变换来制作一个班轮:
df['D'] = df.groupby(['A','B'])['C'].transform('sum')