在 Pandas groupby 对象上获取 count() 函数的最大值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51053911/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get max of count() function on pandas groupby objects
提问by SummerEla
Problem
问题
Using pandas, I need to get back the row with the max count for each groupby object.
使用 Pandas,我需要为每个 groupby 对象取回具有最大计数的行。
Dataset
数据集
I have a dataframe called "matches" that looks like this:
我有一个名为“matches”的数据框,如下所示:
FeatureID gene pos
0 1_1_1 KRAS_1 6
1 1_1_1 KRAS_2 8
2 1_1_1 KRAS_3 11
3 1_1_1 NRAS_1 3
4 1_1_1 NRAS_2 11
5 1_1_1 NRAS_3 84
6 1_1_10 KRAS_1 4
7 1_1_10 KRAS_2 3
8 1_1_10 KRAS_3 14
9 1_1_10 NRAS_1 4
10 1_1_10 NRAS_2 6
11 1_1_10 NRAS_3 83
FeatureID gene pos
0 1_1_1 KRAS_1 6
1 1_1_1 KRAS_2 8
2 1_1_1 KRAS_3 11
3 1_1_1 NRAS_1 3
4 1_1_1 NRAS_2 11
5 1_1_1 NRAS_3 84
6 1_1_10 KRAS_1 4
7 1_1_10 KRAS_2 3
8 1_1_10 KRAS_3 14
9 1_1_10 NRAS_1 4
10 1_1_10 NRAS_2 6
11 1_1_10 NRAS_3 83
What I've tried
我试过的
I need to group together the dataframe by FeatureID and then get the count of positions in each group:
我需要按 FeatureID 将数据框组合在一起,然后获取每组中的位置计数:
matches.groupby(["FeatureID", "gene"]).count()
Which results in:
结果是:
FeatureID gene
1_1_1 KRAS_1 6
KRAS_2 8
KRAS_3 11
NRAS_1 3
NRAS_2 11
NRAS_3 84
1_1_10 KRAS_1 4
KRAS_2 3
KRAS_3 14
NRAS_1 4
NRAS_2 6
FeatureID gene
1_1_1 KRAS_1 6
KRAS_2 8
KRAS_3 11
NRAS_1 3
NRAS_2 11
NRAS_3 84
1_1_10 KRAS_1 4
KRAS_2 3
KRAS_3 14
NRAS_1 4
NRAS_2 6
Desired output:
期望的输出:
I need to get back the row in each groupby object that contains the highest count, but I cannot figure out how to do that.
我需要取回每个 groupby 对象中包含最高计数的行,但我不知道如何做到这一点。
FeatureID gene count
1_1_1 NRAS_3 84
1_1_10 KRAS_3 14
FeatureID gene count
1_1_1 NRAS_3 84
1_1_10 KRAS_3 14
Solution
解决方案
The following line gives me back the gene with the max value for each groupby group:
以下行为我返回具有每个 groupby 组最大值的基因:
matches.groupby(["FeatureID", "gene"]).count().sort_values("pos").groupby(level=0).tail(1)
回答by YOBEN_S
You can do with max
on level=0
你可以做max
的level=0
matches.groupby(["FeatureID", "gene"]).count().max(level=0)
If keep both level
如果保持两个水平
df.groupby(["FeatureID", "gene"]).count().sort_values().groupby(level=0).tail(1)