pandas 熊猫：在 groupby 组内对观察进行排序

Question

提问by Dmitry B.

According to the answer to pandas groupby sort within groups, in order to sort observations within each group one needs to do a second groupbyon the results of the first groupby. Why a second groupbyis needed? I would've assumed that observations are already arranged into groups after running the first groupbyand all that would be needed is a way to enumerate those groups (and run applywith order).

根据pandas groupby sort inside groups的答案，为了对每个组内的观察结果进行排序，需要对第一个groupby结果进行第二次处理groupby。为什么groupby需要一秒钟？我会假设在运行第一个之后观察已经被安排成组groupby，所需要的只是一种枚举这些组的方法（并apply使用order）。

Answer 1

回答by tvashtar

Because once you apply a function after a groupby the results are combined back into a normal ungrouped data frame. Using groupby and a groupby method like sort should be thought of like a Split-Apply-Combine operation

因为一旦在 groupby 之后应用函数，结果就会组合回正常的未分组数据框。使用 groupby 和诸如 sort 之类的 groupby 方法应该被认为是Split-Apply-Combine 操作

The groupby splits the original data frame and the method is applied to each group, but then the results are combined again implicitly.

groupby 拆分原始数据框并将该方法应用于每个组，但随后再次隐式组合结果。

In that other question, they could have reversed the operation (sorted first) and then not have to use two groupbys. They could do:

在另一个问题中，他们本可以颠倒操作（先排序），然后不必使用两个 groupby。他们可以这样做：

df.sort(['job','count'],ascending=False).groupby('job').head(3)

Answer 2

回答by Istopopoki

They need a second group by in that case, because on top of sorting, they want to keep only the top 3 rows of each group.

在这种情况下，他们需要第二个 group by，因为在排序的基础上，他们只想保留每个 group 的前 3 行。

If you just need to sort after a group by you can do :

如果您只需要按组排序，您可以执行以下操作：

df_res = df.groupby(['job','source']).agg({'count':sum}).sort_values(['job','count'],ascending=False)

One group by is enough.

一组就够了。

And if you want to keep the 3 rows with the highest count for each group, then you can group again and use the head() function :

如果你想保留每组计数最高的 3 行，那么你可以再次分组并使用 head() 函数：

df_res.groupby('job').head(3)

pandas 熊猫：在 groupby 组内对观察进行排序

提问by Dmitry B.

回答by tvashtar

回答by Istopopoki

相关推荐

最近更新

标签

pandas 熊猫：在 groupby 组内对观察进行排序

提问by Dmitry B.

回答by tvashtar

回答by Istopopoki

相关推荐

pandas 情节不会在 Jupyter 中显示

pandas 用熊猫创建空的 csv 文件

将日期列设置为索引 date.time pandas python

pandas 如何用来自不同数据集的“边际”（分布直方图）覆盖 Seaborn 联合图

相关推荐

最近更新

标签