Pandas：计算所有仅在两列中不同的条目的平均值

Question

提问by Anaphory

I just picked up pandas, thinking that it will enable me to do data analysis nicely in python. Now I have a pandasdata frame of the following form:

我刚接了pandas，觉得它能让我在python中很好地进行数据分析。现在我有一个pandas以下形式的数据框：

pandas.DataFrame({"p1": [1, 1, 2, 2, 3, 3]*2,
                  "p2": [1]*6+[2]*6,
                  "run": [1, 2]*6,
                  "result": xrange(12)})

    p1  p2  result  run
0    1   1       0    1
1    1   1       1    2
2    2   1       2    1
3    2   1       3    2
4    3   1       4    1
5    3   1       5    2
6    1   2       6    1
7    1   2       7    2
8    2   2       8    1
9    2   2       9    2
10   3   2      10    1
11   3   2      11    2

I would like to generate the frame that contains one entry for every set of parameters p1and p2with the average of all values of resultfor these parameters, that is,

我想为每组参数生成一个包含一个条目的框架，p1并p2使用result这些参数的所有值的平均值，即，

   p1  p2  result
0   1   1     0.5
1   2   1     2.5
2   3   1     4.5
3   1   2     6.5
4   2   2     8.5
5   3   2    10.5

What is the pandasway to do this? I would try to copy the original table, drop columns that differ (resultand run), reindex that, combine both things again with the new index as multi-index and then run the mean method for that outer multi-index level. Is that theway to do it, and if yes, how do I do these index things properly in code?

有什么pandas方法可以做到这一点？我会尝试复制原始表，删除不同 (result和run) 的列，重新索引它，再次将两者与新索引组合为多索引，然后运行该外部多索引级别的均值方法。这是这样做的方式吗，如果是，我如何在代码中正确地做这些索引事情？

Answer 1

回答by Matti John

You can use groupby (I have called your dataframe df):

您可以使用 groupby（我已将您的数据框称为 df）：

df.groupby(['p1', 'p2']).mean()

This results in a MultiIndex DataFrame. To get the layout in your question, select only the columns you want and reset the index:

这会产生一个多索引数据帧。要获得问题中的布局，请仅选择所需的列并重置索引：

df.groupby(['p1', 'p2']).mean()['result'].reset_index()

Pandas：计算所有仅在两列中不同的条目的平均值

提问by Anaphory

回答by Matti John

相关推荐

最近更新

标签

Pandas：计算所有仅在两列中不同的条目的平均值

提问by Anaphory

回答by Matti John

相关推荐

使用 Pandas 绘制带有误差条的条形图

pandas 如何直接以gzipped格式保存pandas数据帧？

将函数应用于可以返回多行的 Pandas DataFrame

使用 Pandas TimeSeries 选择某个时间戳后的第一个索引

相关推荐

最近更新

标签