pandas groupby、sum 和 count 到一张表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39295910/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
groupby, sum and count to one table
提问by Heisenberg
I have a dataframe below
我在下面有一个数据框
df=pd.DataFrame({"A":np.random.randint(1,10,9),"B":np.random.randint(1,10,9),"C":list('abbcacded')})
A B C
0 9 6 a
1 2 2 b
2 1 9 b
3 8 2 c
4 7 6 a
5 3 5 c
6 1 3 d
7 9 9 e
8 3 4 d
I would like to get grouping result (with key="C" column) below,and the row c d and e is dropped intentionally.
我想在下面得到分组结果(key="C" 列),故意删除行 cd 和 e。
number A_sum B_sum
a 2 16 15
b 2 3 11
this is 2row*3column dataframe. the grouping key is column C. And The column "number"represents the count of each letter(a and b). A_sum and B_sum represents grouping sum of letters in column C.
这是 2row*3column 数据框。分组键是 C 列。“数字”列表示每个字母(a 和 b)的计数。A_sum 和 B_sum 表示 C 列字母的分组总和。
I guess we should use method groupby but how can I get this data summary table ?
我想我们应该使用 groupby 方法,但是我怎样才能得到这个数据汇总表?
采纳答案by Psidom
One option is to count the size and sum the columns for each group separately and then join them by index:
一种选择是分别计算每个组的大小和列的总和,然后按索引加入它们:
df.groupby("C")['A'].agg({"number": 'size'}).join(df.groupby('C').sum())
number A B
# C
# a 2 11 8
# b 2 14 12
# c 2 8 5
# d 2 11 12
# e 1 7 2
You can also do df.groupby('C').agg(["sum", "size"])
which gives an extra duplicated size column, but if you are fine with that, it should also work.
您也可以这样做df.groupby('C').agg(["sum", "size"])
,它会提供一个额外的重复大小列,但如果您对此感到满意,它也应该可以工作。
回答by Ami Tavory
You can do this using a single groupby
with
您可以使用单个groupby
with来执行此操作
res = df.groupby(df.C).agg({'A': 'sum', 'B': {'sum': 'sum', 'count': 'count'}})
res.columns = ['A_sum', 'B_sum', 'count']