pandas 熊猫重复属性的总和
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29583312/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Sum of Duplicate Attributes
提问by user2723240
I'm using Pandas to manipulate a csv file with several rows and columns that looks like the following
我正在使用 Pandas 来操作一个包含多行和多列的 csv 文件,如下所示
Fullname Amount Date Zip State .....
John Joe 1 1/10/1900 55555 Confusion
Betty White 5 . . Alaska
Bruce Wayne 10 . . Frustration
John Joe 20 . . .
Betty White 25 . . .
I'd like to create a new column entitled "Total" with a total sum of amount for each person. (Identified by fullname and zip). I'm having difficulty in finding the correct solution.
我想创建一个名为“总计”的新列,其中包含每个人的总金额。(由全名和 zip 标识)。我很难找到正确的解决方案。
Let's just call my csv import csvfile. Here is what I have.
让我们调用我的 csv 导入 csvfile。这是我所拥有的。
import Pandas
df = pandas.read_csv('csvfile.csv', header = 0)
df.sort(['fullname'])
I think I have to use the iterrows to do what I want as an object. The problem with dropping duplicates is that I will lose the amount or the amount may be different.
我想我必须使用 iterrows 来做我想做的事情。删除重复项的问题是我会丢失数量或数量可能不同。
回答by EdChum
I think you want this:
我想你想要这个:
df['Total'] = df.groupby(['Fullname', 'Zip'])['Amount'].transform('sum')
So groupbywill group by the Fullnameand zipcolumns, as you've stated, we then call transformon the Amountcolumn and calculate the total amount by passing in the string sum, this will return a series with the index aligned to the original df, you can then drop the duplicates afterwards. e.g.
因此,groupby将按Fullname和zip列分组,如您所述,然后我们调用transform该Amount列并通过传入字符串来计算总量sum,这将返回一个索引与原始索引对齐的系列,df然后您可以删除重复项然后。例如
new_df = df.drop_duplicates(subset=['Fullname', 'Zip'])

