python pandas groupby() 结果

Question

提问by Simon Righley

I have the following python pandas data frame:

我有以下 python pandas 数据框：

df = pd.DataFrame( {
   'A': [1,1,1,1,2,2,2,3,3,4,4,4],
   'B': [5,5,6,7,5,6,6,7,7,6,7,7],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1]
    } );

df
    A  B  C
0   1  5  1
1   1  5  1
2   1  6  1
3   1  7  1
4   2  5  1
5   2  6  1
6   2  6  1
7   3  7  1
8   3  7  1
9   4  6  1
10  4  7  1
11  4  7  1

I would like to have another column storing a value of a sum over C values for fixed (both) A and B. That is, something like:

我想要另一列存储固定（两个）A 和 B 的 C 值的总和值。也就是说，类似于：

    A  B  C  D
0   1  5  1  2
1   1  5  1  2
2   1  6  1  1
3   1  7  1  1
4   2  5  1  1
5   2  6  1  2
6   2  6  1  2
7   3  7  1  2
8   3  7  1  2
9   4  6  1  1
10  4  7  1  2
11  4  7  1  2

I have tried with pandas groupbyand it kind of works:

我试过熊猫groupby，它有点工作：

res = {}
for a, group_by_A in df.groupby('A'):
    group_by_B = group_by_A.groupby('B', as_index = False)
    res[a] = group_by_B['C'].sum()

but I don't know how to 'get' the results from resinto dfin the orderly fashion. Would be very happy with any advice on this. Thank you.

但我不知道如何“得到”从结果res到df在有序的方式。对这方面的任何建议都会非常满意。谢谢你。

Answer 1

采纳答案by Andy Hayden

Here's one way (though it feels this should work in one go with an apply, I can't get it).

这是一种方法（虽然感觉这应该与申请一起工作，但我无法理解）。

In [11]: g = df.groupby(['A', 'B'])

In [12]: df1 = df.set_index(['A', 'B'])

The sizegroupby function is the one you want, we have to match it to the 'A' and 'B' as the index:

该sizeGROUPBY功能是你想要的，我们必须把它匹配到“A”和“B”作为索引：

In [13]: df1['D'] = g.size()  # unfortunately this doesn't play nice with as_index=False
# Same would work with g['C'].sum()

In [14]: df1.reset_index()
Out[14]:
    A  B  C  D
0   1  5  1  2
1   1  5  1  2
2   1  6  1  1
3   1  7  1  1
4   2  5  1  1
5   2  6  1  2
6   2  6  1  2
7   3  7  1  2
8   3  7  1  2
9   4  6  1  1
10  4  7  1  2
11  4  7  1  2

Answer 2

回答by andrew

You could also do a one liner using merge as follows:

您还可以使用合并做一个单行如下：

df = df.merge(pd.DataFrame({'D':df.groupby(['A', 'B'])['C'].size()}), left_on=['A', 'B'], right_index=True)

Answer 3

回答by DrTRD

You could also do a one liner using transform applied to the groupby:

您还可以使用应用于 groupby 的变换来制作一个班轮：

df['D'] = df.groupby(['A','B'])['C'].transform('sum')

python pandas groupby() 结果

提问by Simon Righley

采纳答案by Andy Hayden

回答by andrew

回答by DrTRD

相关推荐

最近更新

标签

python pandas groupby() 结果

提问by Simon Righley

采纳答案by Andy Hayden

回答by andrew

回答by DrTRD

相关推荐

Python 发布 osx 通知

Python 导入错误：没有名为 sklearn.cross_validation 的模块

Python：删除除法小数点

Python Pylint：覆盖单个文件中的最大行长度

相关推荐

最近更新

标签