Pandas:计算所有仅在两列中不同的条目的平均值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13606487/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Calculate average for all entries that only differ in two columns
提问by Anaphory
I just picked up pandas, thinking that it will enable me to do data analysis nicely in python. Now I have a pandasdata frame of the following form:
我刚接了pandas,觉得它能让我在python中很好地进行数据分析。现在我有一个pandas以下形式的数据框:
pandas.DataFrame({"p1": [1, 1, 2, 2, 3, 3]*2,
"p2": [1]*6+[2]*6,
"run": [1, 2]*6,
"result": xrange(12)})
p1 p2 result run
0 1 1 0 1
1 1 1 1 2
2 2 1 2 1
3 2 1 3 2
4 3 1 4 1
5 3 1 5 2
6 1 2 6 1
7 1 2 7 2
8 2 2 8 1
9 2 2 9 2
10 3 2 10 1
11 3 2 11 2
I would like to generate the frame that contains one entry for every set of parameters p1and p2with the average of all values of resultfor these parameters, that is,
我想为每组参数生成一个包含一个条目的框架,p1并p2使用result这些参数的所有值的平均值,即,
p1 p2 result
0 1 1 0.5
1 2 1 2.5
2 3 1 4.5
3 1 2 6.5
4 2 2 8.5
5 3 2 10.5
What is the pandasway to do this? I would try to copy the original table, drop columns that differ (resultand run), reindex that, combine both things again with the new index as multi-index and then run the mean method for that outer multi-index level. Is that theway to do it, and if yes, how do I do these index things properly in code?
有什么pandas方法可以做到这一点?我会尝试复制原始表,删除不同 (result和run) 的列,重新索引它,再次将两者与新索引组合为多索引,然后运行该外部多索引级别的均值方法。这是这样做的方式吗,如果是,我如何在代码中正确地做这些索引事情?
回答by Matti John
You can use groupby (I have called your dataframe df):
您可以使用 groupby(我已将您的数据框称为 df):
df.groupby(['p1', 'p2']).mean()
This results in a MultiIndex DataFrame. To get the layout in your question, select only the columns you want and reset the index:
这会产生一个多索引数据帧。要获得问题中的布局,请仅选择所需的列并重置索引:
df.groupby(['p1', 'p2']).mean()['result'].reset_index()

