Pandas groupby + 转换和多列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/53212490/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:08:32  来源:igfitidea点击:

Pandas groupby + transform and multiple columns

pythonpandaspandas-groupby

提问by Willem

To obtain results executed on groupby-data with the same level of detail as the original DataFrame (same observation count) I have used the transform function.

为了获得在 groupby-data 上执行的结果与原始 DataFrame 具有相同的细节级别(相同的观察计数),我使用了转换函数。

Example: Original dataframe

示例: 原始数据框

name, year, grade
Hyman, 2010, 6
Hyman, 2011, 7
Rosie, 2010, 7
Rosie, 2011, 8

After groupby transform

groupby 变换后

name, year, grade, average grade
Hyman, 2010, 6, 6.5
Hyman, 2011, 7, 6.5
Rosie, 2010, 7, 7.5
Rosie, 2011, 8, 7.5

However, with more advanced functions based on multiple columns things get more complicated. What puzzles me is that I seem to be unable to access multiple columns in a groupby-transform combination.

但是,使用基于多列的更高级函数,事情会变得更加复杂。让我感到困惑的是,我似乎无法访问 groupby-transform 组合中的多个列。

df = pd.DataFrame({'a':[1,2,3,4,5,6],
               'b':[1,2,3,4,5,6],
               'c':['q', 'q', 'q', 'q', 'w', 'w'],  
               'd':['z','z','z','o','o','o']})

def f(x):
 y=sum(x['a'])+sum(x['b'])
 return(y)

df['e'] = df.groupby(['c','d']).transform(f)

Gives me:

给我:

KeyError: ('a', 'occurred at index a')

Though I know that following does work:

虽然我知道以下确实有效:

df.groupby(['c','d']).apply(f)

What causes this behavior and how can I obtain something like this:

是什么导致了这种行为以及我如何获得这样的东西:

a   b   c   d   e
1   1   q   z   12
2   2   q   z   12
3   3   q   z   12
4   4   q   o   8
5   5   w   o   22
6   6   w   o   22

回答by Haleemur Ali

for this particular case you could do:

对于这种特殊情况,您可以执行以下操作:

g = df.groupby(['c', 'd'])

df['e'] = g.a.transform('sum') + g.b.transform('sum')

df
# outputs

   a  b  c  d   e
0  1  1  q  z  12
1  2  2  q  z  12
2  3  3  q  z  12
3  4  4  q  o   8
4  5  5  w  o  22
5  6  6  w  o  22

if you can construct the final result by a linear combination of the independent transforms on the same groupby, this method would work.

如果您可以通过同一 groupby 上的独立变换的线性组合来构造最终结果,则此方法将起作用。

otherwise, you'd use a groupby-applyand then merge back to the original df.

否则,您将使用 agroupby-apply然后合并回原始 df。

example:

例子:

_ = df.groupby(['c','d']).apply(lambda x: sum(x.a+x.b)).rename('e').reset_index()
df.merge(_, on=['c','d'])
# same output as above.

回答by jpp

You can use GroupBy+ transformwith sumtwice:

您可以使用GroupBy+transformsum两次:

df['e'] = df.groupby(['c', 'd'])[['a', 'b']].transform('sum').sum(1)

print(df)

   a  b  c  d   e
0  1  1  q  z  12
1  2  2  q  z  12
2  3  3  q  z  12
3  4  4  q  o   8
4  5  5  w  o  22
5  6  6  w  o  22