Pandas：在 DataFrame 中创建聚合列

Question

提问by foglerit

With the DataFrame below as an example,

以下面的DataFrame为例，

In [83]:
df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})
df
Out[83]:
   A  B  values
0  1  1      10
1  1  2      15
2  2  1      20
3  2  2      25

What would be a simple way to generate a new column containing some aggregation of the data over one of the columns?

生成包含其中一列数据聚合的新列的简单方法是什么？

For example, if I sum valuesover items in A

例如，如果我values对中的项目求和A

In [84]:
df.groupby('A').sum()['values']
Out[84]:
A
1    25
2    45
Name: values

How can I get

我怎样才能得到

   A  B  values  sum_values_A
0  1  1      10            25
1  1  2      15            25
2  2  1      20            45
3  2  2      25            45

Answer 1

回答by Wouter Overmeire

In [20]: df = pd.DataFrame({'A':[1,1,2,2],'B':[1,2,1,2],'values':np.arange(10,30,5)})

In [21]: df
Out[21]:
   A  B  values
0  1  1      10
1  1  2      15
2  2  1      20
3  2  2      25

In [22]: df['sum_values_A'] = df.groupby('A')['values'].transform(np.sum)

In [23]: df
Out[23]:
   A  B  values  sum_values_A
0  1  1      10            25
1  1  2      15            25
2  2  1      20            45
3  2  2      25            45

Answer 2

回答by foglerit

I found a way using join:

我找到了一种使用方法join：

In [101]:
aggregated = df.groupby('A').sum()['values']
aggregated.name = 'sum_values_A'
df.join(aggregated,on='A')

Out[101]:
   A  B  values  sum_values_A
0  1  1      10            25
1  1  2      15            25
2  2  1      20            45
3  2  2      25            45

Anyone has a simpler way to do it?

任何人都有更简单的方法来做到这一点？

Answer 3

回答by joaquin

This is not so direct but I found it very intuitive (the use of map to create new columns from another column) and can be applied to many other cases:

这不是那么直接，但我发现它非常直观（使用 map 从另一列创建新列）并且可以应用于许多其他情况：

gb = df.groupby('A').sum()['values']

def getvalue(x):
    return gb[x]

df['sum'] = df['A'].map(getvalue)
df

Answer 4

回答by Garrett

In [15]: def sum_col(df, col, new_col):
   ....:     df[new_col] = df[col].sum()
   ....:     return df

In [16]: df.groupby("A").apply(sum_col, 'values', 'sum_values_A')
Out[16]: 
   A  B  values  sum_values_A
0  1  1      10            25
1  1  2      15            25
2  2  1      20            45
3  2  2      25            45

Pandas：在 DataFrame 中创建聚合列

提问by foglerit

回答by Wouter Overmeire

回答by foglerit

回答by joaquin

回答by Garrett

相关推荐

最近更新

标签

Pandas：在 DataFrame 中创建聚合列

提问by foglerit

回答by Wouter Overmeire

回答by foglerit

回答by joaquin

回答by Garrett

相关推荐

pandas python pandas的转换器

pandas 如何在熊猫中将两个数据框与不同的列标签相乘？

从 python pandas 中的 DataFrame 中删除特定行

使用 Python Pandas 使用通配符名称搜索对所有列求和

相关推荐

最近更新

标签