pandas 熊猫分组并总结两列

Question

提问by acpigeon

Beginner question. This seems like it should be a straightforward operation, but I can't figure it out from reading the docs.

初学者问题。这似乎应该是一个简单的操作，但我无法从阅读文档中弄清楚。

I have a df with this structure:

我有一个具有这种结构的 df：

|integer_id|int_field_1|int_field_2|

The integer_id column is non-unique, so I'd like to group the df by integer_id and sum the two fields.

integer_id 列是非唯一的，所以我想按 integer_id 对 df 进行分组并对两个字段求和。

The equivalent SQL is:

等效的 SQL 是：

SELECT integer_id, SUM(int_field_1), SUM(int_field_2) FROM tbl
GROUP BY integer_id

Any suggestions on the simplest way to do this?

有关执行此操作的最简单方法的任何建议？

EDIT: Including input/output

编辑：包括输入/输出

Input:  
integer_id  int_field_1 int_field_2   
2656        36          36  
2656        36          36  
9702        2           2  
9702        1           1

Ouput using df.groupby('integer_id').sum():

使用 df.groupby('integer_id').sum() 输出：

integer_id  int_field_1 int_field_2  
2656        72          72  
9702        3           3

Answer 1

回答by EdChum

You just need to call sumon a groupbyobject:

你只需要调用sum一个groupby对象：

df.groupby('integer_id').sum()

See the docsfor further examples

有关更多示例，请参阅文档

Answer 2

回答by Bastin Robin

You can do it

你能行的

data.groupby(by=['account_ID'])['purchases'].sum()

Answer 3

回答by xxyjoel

A variation on the .agg() function; provides the ability to (1) persist type DataFrame, (2) apply averages, counts, summations, etc. and (3) enables groupby on multiple columns while maintaining legibility.

.agg() 函数的变体；提供以下能力：(1) 保留类型 DataFrame，(2) 应用平均值、计数、求和等，以及 (3) 在保持易读性的同时在多列上启用 groupby。

df.groupby(['att1', 'att2']).agg({'att1': "count", 'att3': "sum",'att4': 'mean'})

using your values...

使用你的价值观...

df.groupby(['integer_id']).agg({'int_field_1': "sum", 'int_field_2': "sum" })

pandas 熊猫分组并总结两列

提问by acpigeon

回答by EdChum

回答by Bastin Robin

回答by xxyjoel

相关推荐

最近更新

标签

pandas 熊猫分组并总结两列

提问by acpigeon

回答by EdChum

回答by Bastin Robin

回答by xxyjoel

相关推荐

pandas 从 ElasticSearch 结果创建 DataFrame

numpy.ndarray 与 pandas.DataFrame

pandas Python 等效于 R 运算符“%in%”

在 Pandas 中同步两个大数据帧的最有效方法是什么？

相关推荐

最近更新

标签