Pandas groupby + 转换和多列

Question

提问by Willem

To obtain results executed on groupby-data with the same level of detail as the original DataFrame (same observation count) I have used the transform function.

为了获得在 groupby-data 上执行的结果与原始 DataFrame 具有相同的细节级别（相同的观察计数），我使用了转换函数。

Example: Original dataframe

示例： 原始数据框

name, year, grade
Hyman, 2010, 6
Hyman, 2011, 7
Rosie, 2010, 7
Rosie, 2011, 8

After groupby transform

groupby 变换后

name, year, grade, average grade
Hyman, 2010, 6, 6.5
Hyman, 2011, 7, 6.5
Rosie, 2010, 7, 7.5
Rosie, 2011, 8, 7.5

However, with more advanced functions based on multiple columns things get more complicated. What puzzles me is that I seem to be unable to access multiple columns in a groupby-transform combination.

但是，使用基于多列的更高级函数，事情会变得更加复杂。让我感到困惑的是，我似乎无法访问 groupby-transform 组合中的多个列。

df = pd.DataFrame({'a':[1,2,3,4,5,6],
               'b':[1,2,3,4,5,6],
               'c':['q', 'q', 'q', 'q', 'w', 'w'],  
               'd':['z','z','z','o','o','o']})

def f(x):
 y=sum(x['a'])+sum(x['b'])
 return(y)

df['e'] = df.groupby(['c','d']).transform(f)

Gives me:

给我：

KeyError: ('a', 'occurred at index a')

Though I know that following does work:

虽然我知道以下确实有效：

df.groupby(['c','d']).apply(f)

What causes this behavior and how can I obtain something like this:

是什么导致了这种行为以及我如何获得这样的东西：

a   b   c   d   e
1   1   q   z   12
2   2   q   z   12
3   3   q   z   12
4   4   q   o   8
5   5   w   o   22
6   6   w   o   22

Answer 1

回答by Haleemur Ali

for this particular case you could do:

对于这种特殊情况，您可以执行以下操作：

g = df.groupby(['c', 'd'])

df['e'] = g.a.transform('sum') + g.b.transform('sum')

df
# outputs

   a  b  c  d   e
0  1  1  q  z  12
1  2  2  q  z  12
2  3  3  q  z  12
3  4  4  q  o   8
4  5  5  w  o  22
5  6  6  w  o  22

if you can construct the final result by a linear combination of the independent transforms on the same groupby, this method would work.

如果您可以通过同一 groupby 上的独立变换的线性组合来构造最终结果，则此方法将起作用。

otherwise, you'd use a groupby-applyand then merge back to the original df.

否则，您将使用 agroupby-apply然后合并回原始 df。

example:

例子：

_ = df.groupby(['c','d']).apply(lambda x: sum(x.a+x.b)).rename('e').reset_index()
df.merge(_, on=['c','d'])
# same output as above.

Answer 2

回答by jpp

You can use GroupBy+ transformwith sumtwice:

您可以使用GroupBy+transform用sum两次：

df['e'] = df.groupby(['c', 'd'])[['a', 'b']].transform('sum').sum(1)

print(df)

   a  b  c  d   e
0  1  1  q  z  12
1  2  2  q  z  12
2  3  3  q  z  12
3  4  4  q  o   8
4  5  5  w  o  22
5  6  6  w  o  22

Pandas groupby + 转换和多列

提问by Willem

回答by Haleemur Ali

回答by jpp

相关推荐

最近更新

标签

Pandas groupby + 转换和多列

提问by Willem

回答by Haleemur Ali

回答by jpp

相关推荐

Python。从 Pandas 列中提取字符串的最后一个字母

pandas Jupyter Notebook - 在函数内部绘图 - 未绘制图形

Pandas，将日期时间格式 mm/dd/yyyy 转换为 dd/mm/yyyy

Pandas DataFrame - 用空白替换 NULL 字符串，用 0 替换 NULL 数字

相关推荐

最近更新

标签