pandas groupby、sum 和 count 到一张表

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39295910/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:56:41  来源:igfitidea点击:

groupby, sum and count to one table

pythonpandasnumpy

提问by Heisenberg

I have a dataframe below

我在下面有一个数据框

df=pd.DataFrame({"A":np.random.randint(1,10,9),"B":np.random.randint(1,10,9),"C":list('abbcacded')})

   A  B  C
0  9  6  a
1  2  2  b
2  1  9  b
3  8  2  c
4  7  6  a
5  3  5  c
6  1  3  d
7  9  9  e
8  3  4  d

I would like to get grouping result (with key="C" column) below,and the row c d and e is dropped intentionally.

我想在下面得到分组结果(key="C" 列),故意删除行 cd 和 e。

   number   A_sum   B_sum
a   2        16       15
b   2        3        11

this is 2row*3column dataframe. the grouping key is column C. And The column "number"represents the count of each letter(a and b). A_sum and B_sum represents grouping sum of letters in column C.

这是 2row*3column 数据框。分组键是 C 列。“数字”列表示每个字母(a 和 b)的计数。A_sum 和 B_sum 表示 C 列字母的分组总和。

I guess we should use method groupby but how can I get this data summary table ?

我想我们应该使用 groupby 方法,但是我怎样才能得到这个数据汇总表?

采纳答案by Psidom

One option is to count the size and sum the columns for each group separately and then join them by index:

一种选择是分别计算每个组的大小和列的总和,然后按索引加入它们:

df.groupby("C")['A'].agg({"number": 'size'}).join(df.groupby('C').sum())

    number  A   B
# C         
# a     2   11  8
# b     2   14  12
# c     2   8   5
# d     2   11  12
# e     1   7   2

You can also do df.groupby('C').agg(["sum", "size"])which gives an extra duplicated size column, but if you are fine with that, it should also work.

您也可以这样做df.groupby('C').agg(["sum", "size"]),它会提供一个额外的重复大小列,但如果您对此感到满意,它也应该可以工作。

回答by Ami Tavory

You can do this using a single groupbywith

您可以使用单个groupbywith来执行此操作

res = df.groupby(df.C).agg({'A': 'sum', 'B': {'sum': 'sum', 'count': 'count'}})

res.columns = ['A_sum', 'B_sum', 'count']