Pandas groupby + 转换和多列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/53212490/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas groupby + transform and multiple columns
提问by Willem
To obtain results executed on groupby-data with the same level of detail as the original DataFrame (same observation count) I have used the transform function.
为了获得在 groupby-data 上执行的结果与原始 DataFrame 具有相同的细节级别(相同的观察计数),我使用了转换函数。
Example: Original dataframe
示例: 原始数据框
name, year, grade
Hyman, 2010, 6
Hyman, 2011, 7
Rosie, 2010, 7
Rosie, 2011, 8
After groupby transform
groupby 变换后
name, year, grade, average grade
Hyman, 2010, 6, 6.5
Hyman, 2011, 7, 6.5
Rosie, 2010, 7, 7.5
Rosie, 2011, 8, 7.5
However, with more advanced functions based on multiple columns things get more complicated. What puzzles me is that I seem to be unable to access multiple columns in a groupby-transform combination.
但是,使用基于多列的更高级函数,事情会变得更加复杂。让我感到困惑的是,我似乎无法访问 groupby-transform 组合中的多个列。
df = pd.DataFrame({'a':[1,2,3,4,5,6],
'b':[1,2,3,4,5,6],
'c':['q', 'q', 'q', 'q', 'w', 'w'],
'd':['z','z','z','o','o','o']})
def f(x):
y=sum(x['a'])+sum(x['b'])
return(y)
df['e'] = df.groupby(['c','d']).transform(f)
Gives me:
给我:
KeyError: ('a', 'occurred at index a')
Though I know that following does work:
虽然我知道以下确实有效:
df.groupby(['c','d']).apply(f)
What causes this behavior and how can I obtain something like this:
是什么导致了这种行为以及我如何获得这样的东西:
a b c d e
1 1 q z 12
2 2 q z 12
3 3 q z 12
4 4 q o 8
5 5 w o 22
6 6 w o 22
回答by Haleemur Ali
for this particular case you could do:
对于这种特殊情况,您可以执行以下操作:
g = df.groupby(['c', 'd'])
df['e'] = g.a.transform('sum') + g.b.transform('sum')
df
# outputs
a b c d e
0 1 1 q z 12
1 2 2 q z 12
2 3 3 q z 12
3 4 4 q o 8
4 5 5 w o 22
5 6 6 w o 22
if you can construct the final result by a linear combination of the independent transforms on the same groupby, this method would work.
如果您可以通过同一 groupby 上的独立变换的线性组合来构造最终结果,则此方法将起作用。
otherwise, you'd use a groupby-apply
and then merge back to the original df.
否则,您将使用 agroupby-apply
然后合并回原始 df。
example:
例子:
_ = df.groupby(['c','d']).apply(lambda x: sum(x.a+x.b)).rename('e').reset_index()
df.merge(_, on=['c','d'])
# same output as above.
回答by jpp
You can use GroupBy
+ transform
with sum
twice:
您可以使用GroupBy
+transform
用sum
两次:
df['e'] = df.groupby(['c', 'd'])[['a', 'b']].transform('sum').sum(1)
print(df)
a b c d e
0 1 1 q z 12
1 2 2 q z 12
2 3 3 q z 12
3 4 4 q o 8
4 5 5 w o 22
5 6 6 w o 22