按单列对 Pandas 数据框进行总和分组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23642406/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:02:55  来源:igfitidea点击:

Sum grouped Pandas dataframe by single column

pythonpandas

提问by lmart999

I have a Pandas dataframe:

我有一个Pandas数据框:

test=pd.DataFrame(columns=['GroupID','Sample','SampleMeta','Value'])
test.loc[0,:]='1','S1','S1_meta',1
test.loc[1,:]='1','S1','S1_meta',1
test.loc[2,:]='2','S2','S2_meta',1

I want to (1) group by two columns ('GroupID' and 'Sample'), (2) sum 'Value' per group, and (3) retain only unique values in 'SampleMeta' per group. The desired result ('GroupID' and 'Sample' as index) is shown:

我想(1)按两列('GroupID'和'Sample')分组,(2)每组总和'Value',以及(3)每组只保留'SampleMeta'中的唯一值。显示了所需的结果('GroupID' 和 'Sample' 作为索引):

                SampleMeta  Value
GroupID Sample                       
1       S1      S1_meta      2
2       S2      S2_meta      1 

df.groupby() and the .sum() method get close, but .sum() concatenates identical values in the 'Values' column within a group. As a result, the 'S1_meta' value is duplicated.

df.groupby() 和 .sum() 方法很接近,但 .sum() 将相同的值连接在一个组内的“值”列中。结果,'S1_meta' 值被复制。

g=test.groupby(['GroupID','Sample'])
print g.sum()

                SampleMeta      Value
GroupID Sample                       
1       S1      S1_metaS1_meta  2
2       S2      S2_meta         1 

Is there a way to achieve the desired result using groupby() and associated methods? Merging the summed 'Value' per group with a separate 'SampleMeta' DataFrame works but there must be a more elegant solution.

有没有办法使用 groupby() 和相关方法来达到预期的结果?将每个组的总“值”与单独的“SampleMeta”数据帧合并是可行的,但必须有一个更优雅的解决方案。

采纳答案by Karl D.

Well, you can include SampleMetaas part of the groupby:

好吧,您可以将其SampleMeta作为 groupby 的一部分包括在内:

print test.groupby(['GroupID','Sample','SampleMeta']).sum()

                           Value
GroupID Sample SampleMeta       
1       S1     S1_meta         2
2       S2     S2_meta         1

If you don't want SampleMetaas part of the index when done you could modify it as follows:

如果您不想SampleMeta在完成后作为索引的一部分,您可以按如下方式修改它:

print test.groupby(['GroupID','Sample','SampleMeta']).sum().reset_index(level=2)

               SampleMeta  Value
GroupID Sample                  
1       S1        S1_meta      2
2       S2        S2_meta      1

This will only work right if there is no variation within SampleMetafor ['GroupID','Sample']. Of course, If there was variation within ['GroupID','Sample']then you probably to exclude SampleMetafrom the groupby/sum entirely:

这只有在SampleMetafor 中没有变化时才能正常工作['GroupID','Sample']。当然,如果内部有变化,['GroupID','Sample']那么您可能会SampleMeta完全从 groupby/sum 中排除:

print test.groupby(['GroupID','Sample'])['Value'].sum()

GroupID  Sample
1        S1        2
2        S2        1