Pandas：计算数据框中重复条目的平均值

Question

提问by DDRRpy

I have been working with a dataframe in python and pandas that contains duplicate entries in the first column. The dataframe looks something like this:

我一直在使用 python 和 pandas 中的数据框，其中第一列中包含重复的条目。数据框看起来像这样：

    sample_id    qual    percent
0   sample_1      10        20
1   sample_2      20        30
2   sample_1      50        60
3   sample_2      10        90
4   sample_3      100       20

I want to write something that identifies duplicate entries within the first column and calculates the mean values of the subsequent columns. An ideal output would be something similar to the following:

我想写一些东西来识别第一列中的重复条目并计算后续列的平均值。理想的输出类似于以下内容：

    sample_id    qual    percent
0   sample_1      30        40
1   sample_2      15        60
2   sample_3      100       20

I have been struggling with this problem all afternoon and would appreciate any help.

我整个下午都在努力解决这个问题，希望得到任何帮助。

Answer 1

回答by piRSquared

groupbythe sample_idcolumn and use mean

groupby该sample_id列和使用mean

df.groupby('sample_id').mean().reset_index()
or
df.groupby('sample_id', as_index=False).mean()

df.groupby('sample_id').mean().reset_index()
或者
df.groupby('sample_id', as_index=False).mean()

get you

明白

Answer 2

回答by kinjo

Groupby will work.

Groupby 会起作用。

data.groupby('sample_id').mean()

You can then use reset_index()to make look exactly as you want.

然后，您可以使用reset_index()完全按照您的意愿来制作外观。

Pandas：计算数据框中重复条目的平均值

提问by DDRRpy

回答by piRSquared

回答by kinjo

相关推荐

最近更新

标签

Pandas：计算数据框中重复条目的平均值

提问by DDRRpy

回答by piRSquared

回答by kinjo

相关推荐

pandas 通过引用传递pandas DataFrame

pandas 如何匹配pandas DataFrame中的多列“间隔”？

Pandas - 散布矩阵集标题

pandas 重采样错误：无法使用方法或限制重新索引非唯一索引

相关推荐

最近更新

标签