Pandas:计算数据框中重复条目的平均值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39919570/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: calculating the mean values of duplicate entries in a dataframe
提问by DDRRpy
I have been working with a dataframe in python and pandas that contains duplicate entries in the first column. The dataframe looks something like this:
我一直在使用 python 和 pandas 中的数据框,其中第一列中包含重复的条目。数据框看起来像这样:
sample_id qual percent
0 sample_1 10 20
1 sample_2 20 30
2 sample_1 50 60
3 sample_2 10 90
4 sample_3 100 20
I want to write something that identifies duplicate entries within the first column and calculates the mean values of the subsequent columns. An ideal output would be something similar to the following:
我想写一些东西来识别第一列中的重复条目并计算后续列的平均值。理想的输出类似于以下内容:
sample_id qual percent
0 sample_1 30 40
1 sample_2 15 60
2 sample_3 100 20
I have been struggling with this problem all afternoon and would appreciate any help.
我整个下午都在努力解决这个问题,希望得到任何帮助。
回答by piRSquared
回答by kinjo
Groupby will work.
Groupby 会起作用。
data.groupby('sample_id').mean()
You can then use reset_index()
to make look exactly as you want.
然后,您可以使用reset_index()
完全按照您的意愿来制作外观。