Python Pandas Groupby 和 Sum Only 一列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38985053/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Groupby and Sum Only One Column
提问by JSolomonCulp
So I have a dataframe, df1, that looks like the following:
所以我有一个数据框 df1,如下所示:
A B C
1 foo 12 California
2 foo 22 California
3 bar 8 Rhode Island
4 bar 32 Rhode Island
5 baz 15 Ohio
6 baz 26 Ohio
I want to group by column A and then sum column B while keeping the value in column C. Something like this:
我想按 A 列分组,然后对 B 列求和,同时将值保留在 C 列中。像这样:
A B C
1 foo 34 California
2 bar 40 Rhode Island
3 baz 41 Ohio
The issue is, when I say df.groupby('A').sum() column C gets removed returning
问题是,当我说 df.groupby('A').sum() 列 C 被删除返回
B
A
bar 40
baz 41
foo 34
How can I get around this and keep column C when I group and sum?
当我分组和求和时,如何解决这个问题并保留 C 列?
回答by Sevyns
The only way to do this would be to include C in your groupby (the groupby function can accept a list).
唯一的方法是在 groupby 中包含 C(groupby 函数可以接受列表)。
Give this a try:
试试这个:
df.groupby(['A','C'])['B'].sum()
One other thing to note, if you need to work with df after the aggregation you can also use the as_index=False option to return a dataframe object. This one gave me problems when I was first working with Pandas. Example:
需要注意的另一件事是,如果您需要在聚合后使用 df ,您还可以使用 as_index=False 选项返回数据帧对象。当我第一次使用 Pandas 时,这个给我带来了问题。例子:
df.groupby(['A','C'], as_index=False)['B'].sum()
回答by Kartik
If you don't care what's in your column C and just want the nth
value, you could just do this:
如果您不在乎 C 列中的内容而只想要该nth
值,则可以这样做:
df.groupby('A').agg({'B' : 'sum',
'C' : lambda x: x.iloc[n]})