pandas 熊猫重复属性的总和

Question

提问by user2723240

I'm using Pandas to manipulate a csv file with several rows and columns that looks like the following

我正在使用 Pandas 来操作一个包含多行和多列的 csv 文件，如下所示

Fullname     Amount     Date           Zip    State .....
John Joe        1        1/10/1900     55555    Confusion
Betty White     5         .             .       Alaska 
Bruce Wayne     10        .             .       Frustration
John Joe        20        .             .       .
Betty White     25        .             .       .

I'd like to create a new column entitled "Total" with a total sum of amount for each person. (Identified by fullname and zip). I'm having difficulty in finding the correct solution.

我想创建一个名为“总计”的新列，其中包含每个人的总金额。（由全名和 zip 标识）。我很难找到正确的解决方案。

Let's just call my csv import csvfile. Here is what I have.

让我们调用我的 csv 导入 csvfile。这是我所拥有的。

import Pandas
df = pandas.read_csv('csvfile.csv', header = 0) 
df.sort(['fullname'])

I think I have to use the iterrows to do what I want as an object. The problem with dropping duplicates is that I will lose the amount or the amount may be different.

我想我必须使用 iterrows 来做我想做的事情。删除重复项的问题是我会丢失数量或数量可能不同。

Answer 1

回答by EdChum

I think you want this:

我想你想要这个：

df['Total'] = df.groupby(['Fullname', 'Zip'])['Amount'].transform('sum')

So groupbywill group by the Fullnameand zipcolumns, as you've stated, we then call transformon the Amountcolumn and calculate the total amount by passing in the string sum, this will return a series with the index aligned to the original df, you can then drop the duplicates afterwards. e.g.

因此，groupby将按Fullname和zip列分组，如您所述，然后我们调用transform该Amount列并通过传入字符串来计算总量sum，这将返回一个索引与原始索引对齐的系列，df然后您可以删除重复项然后。例如

new_df = df.drop_duplicates(subset=['Fullname', 'Zip'])

pandas 熊猫重复属性的总和

提问by user2723240

回答by EdChum

相关推荐

最近更新

标签

pandas 熊猫重复属性的总和

提问by user2723240

回答by EdChum

相关推荐

如何将选定的列从具有不同列的 df 附加到 Pandas 数据帧

pandas Python：未正确调用 DataFrame 构造函数

pandas 堆叠两个熊猫数据框

pandas 使用python为组中的每个元素添加序列号

相关推荐

最近更新

标签