pandas 熊猫重复属性的总和

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29583312/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:11:10  来源:igfitidea点击:

Pandas Sum of Duplicate Attributes

pythoncsvpandas

提问by user2723240

I'm using Pandas to manipulate a csv file with several rows and columns that looks like the following

我正在使用 Pandas 来操作一个包含多行和多列的 csv 文件,如下所示

Fullname     Amount     Date           Zip    State .....
John Joe        1        1/10/1900     55555    Confusion
Betty White     5         .             .       Alaska 
Bruce Wayne     10        .             .       Frustration
John Joe        20        .             .       .
Betty White     25        .             .       .

I'd like to create a new column entitled "Total" with a total sum of amount for each person. (Identified by fullname and zip). I'm having difficulty in finding the correct solution.

我想创建一个名为“总计”的新列,其中包含每个人的总金额。(由全名和 zip 标识)。我很难找到正确的解决方案。

Let's just call my csv import csvfile. Here is what I have.

让我们调用我的 csv 导入 csvfile。这是我所拥有的。

import Pandas
df = pandas.read_csv('csvfile.csv', header = 0) 
df.sort(['fullname'])

I think I have to use the iterrows to do what I want as an object. The problem with dropping duplicates is that I will lose the amount or the amount may be different.

我想我必须使用 iterrows 来做我想做的事情。删除重复项的问题是我会丢失数量或数量可能不同。

回答by EdChum

I think you want this:

我想你想要这个:

df['Total'] = df.groupby(['Fullname', 'Zip'])['Amount'].transform('sum')

So groupbywill group by the Fullnameand zipcolumns, as you've stated, we then call transformon the Amountcolumn and calculate the total amount by passing in the string sum, this will return a series with the index aligned to the original df, you can then drop the duplicates afterwards. e.g.

因此,groupby将按Fullnamezip列分组,如您所述,然后我们调用transformAmount列并通过传入字符串来计算总量sum,这将返回一个索引与原始索引对齐的系列,df然后您可以删除重复项然后。例如

new_df = df.drop_duplicates(subset=['Fullname', 'Zip'])