Pandas:计算列上组的中位数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35261838/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: Calculate Median of Group over Columns
提问by Dance Party
Given the following data frame:
给定以下数据框:
import pandas as pd
df = pd.DataFrame({'COL1': ['A', 'A','A','A','B','B'],
'COL2' : ['AA','AA','BB','BB','BB','BB'],
'COL3' : [2,3,4,5,4,2],
'COL4' : [0,1,2,3,4,2]})
df
COL1 COL2 COL3 COL4
0 A AA 2 0
1 A AA 3 1
2 A BB 4 2
3 A BB 5 3
4 B BB 4 4
5 B BB 2 2
I would like, as efficiently as possible (i.e. via groupby and lambda x or better), to find the median of columns 3 and 4 for each distinct group of columns 1 and 2.
我想尽可能有效地(即通过 groupby 和 lambda x 或更好)为第 1 和第 2 列的每个不同组找到第 3 和第 4 列的中值。
The desired result is as follows:
期望的结果如下:
COL1 COL2 COL3 COL4 MEDIAN
0 A AA 2 0 1.5
1 A AA 3 1 1.5
2 A BB 4 2 3.5
3 A BB 5 3 3.5
4 B BB 4 4 3
5 B BB 2 2 3
Thanks in advance!
提前致谢!
回答by Happy001
You already had the idea -- groupby COL1 and COL2 and calculate median.
您已经有了这个想法 - groupby COL1 和 COL2 并计算中位数。
m = df.groupby(['COL1', 'COL2'])[['COL3','COL4']].apply(np.median)
m.name = 'MEDIAN'
print df.join(m, on=['COL1', 'COL2'])
COL1 COL2 COL3 COL4 MEDIAN
0 A AA 2 0 1.5
1 A AA 3 1 1.5
2 A BB 4 2 3.5
3 A BB 5 3 3.5
4 B BB 4 4 3.0
5 B BB 2 2 3.0