在 Pandas groupby 对象上获取 count() 函数的最大值

Question

提问by SummerEla

Problem

问题

Using pandas, I need to get back the row with the max count for each groupby object.

使用 Pandas，我需要为每个 groupby 对象取回具有最大计数的行。

Dataset

数据集

I have a dataframe called "matches" that looks like this:

我有一个名为“matches”的数据框，如下所示：

FeatureID gene pos 0 1_1_1 KRAS_1 6 1 1_1_1 KRAS_2 8 2 1_1_1 KRAS_3 11 3 1_1_1 NRAS_1 3 4 1_1_1 NRAS_2 11 5 1_1_1 NRAS_3 84 6 1_1_10 KRAS_1 4 7 1_1_10 KRAS_2 3 8 1_1_10 KRAS_3 14 9 1_1_10 NRAS_1 4 10 1_1_10 NRAS_2 6 11 1_1_10 NRAS_3 83

What I've tried

我试过的

I need to group together the dataframe by FeatureID and then get the count of positions in each group:

我需要按 FeatureID 将数据框组合在一起，然后获取每组中的位置计数：

matches.groupby(["FeatureID", "gene"]).count()

Which results in:

结果是：

FeatureID gene 1_1_1 KRAS_1 6 KRAS_2 8 KRAS_3 11 NRAS_1 3 NRAS_2 11 NRAS_3 84 1_1_10 KRAS_1 4 KRAS_2 3 KRAS_3 14 NRAS_1 4 NRAS_2 6

Desired output:

期望的输出：

I need to get back the row in each groupby object that contains the highest count, but I cannot figure out how to do that.

我需要取回每个 groupby 对象中包含最高计数的行，但我不知道如何做到这一点。

FeatureID gene count 1_1_1 NRAS_3 84 1_1_10 KRAS_3 14

Solution

解决方案

The following line gives me back the gene with the max value for each groupby group:

以下行为我返回具有每个 groupby 组最大值的基因：

matches.groupby(["FeatureID", "gene"]).count().sort_values("pos").groupby(level=0).tail(1)

Answer 1

回答by YOBEN_S

You can do with maxon level=0

你可以做max的level=0

matches.groupby(["FeatureID", "gene"]).count().max(level=0)

If keep both level

如果保持两个水平

df.groupby(["FeatureID", "gene"]).count().sort_values().groupby(level=0).tail(1)

在 Pandas groupby 对象上获取 count() 函数的最大值

提问by SummerEla

Problem

问题

Dataset

数据集

What I've tried

我试过的

Desired output:

期望的输出：

Solution

解决方案

回答by YOBEN_S

相关推荐

最近更新

标签

在 Pandas groupby 对象上获取 count() 函数的最大值

提问by SummerEla

Problem

问题

Dataset

数据集

What I've tried

我试过的

Desired output:

期望的输出：

Solution

解决方案

回答by YOBEN_S

相关推荐

pandas 将带有日期时间索引的行插入到数据框

pandas 在python pandas中搜索整行数据帧的多个字符串值

删除 Pandas 中 DateTime 索引的时间部分

在 Pandas 数据框中将多列转换为字符串

相关推荐

最近更新

标签