Python pandas groupby 在组内排序

Question

提问by JoeDanger

I want to group my dataframe by two columns and then sort the aggregated results within the groups.

我想将我的数据框按两列分组，然后对组内的聚合结果进行排序。

In [167]:
df

Out[167]:
count   job source
0   2   sales   A
1   4   sales   B
2   6   sales   C
3   3   sales   D
4   7   sales   E
5   5   market  A
6   3   market  B
7   2   market  C
8   4   market  D
9   1   market  E

In [168]:
df.groupby(['job','source']).agg({'count':sum})

Out[168]:
            count
job     source  
market  A   5
        B   3
        C   2
        D   4
        E   1
sales   A   2
        B   4
        C   6
        D   3
        E   7

I would now like to sort the count column in descending order within each of the groups. And then take only the top three rows. To get something like:

我现在想在每个组中按降序对计数列进行排序。然后只取前三行。得到类似的东西：

            count
job     source  
market  A   5
        D   4
        B   3
sales   E   7
        C   6
        B   4

Answer 1

采纳答案by joris

What you want to do is actually again a groupby (on the result of the first groupby): sort and take the first three elements per group.

你想要做的实际上又是一个 groupby（在第一个 groupby 的结果上）：排序并获取每组的前三个元素。

Starting from the result of the first groupby:

从第一个 groupby 的结果开始：

In [60]: df_agg = df.groupby(['job','source']).agg({'count':sum})

We group by the first level of the index:

我们按索引的第一级分组：

In [63]: g = df_agg['count'].groupby(level=0, group_keys=False)

Then we want to sort ('order') each group and take the first three elements:

然后我们要对每个组进行排序（'order'）并取前三个元素：

In [64]: res = g.apply(lambda x: x.order(ascending=False).head(3))

However, for this, there is a shortcut function to do this, nlargest:

但是，为此，有一个快捷功能可以做到这一点nlargest：

In [65]: g.nlargest(3)
Out[65]:
job     source
market  A         5
        D         4
        B         3
sales   E         7
        C         6
        B         4
dtype: int64

Answer 2

回答by tvashtar

You could also just do it in one go, by doing the sort first and using head to take the first 3 of each group.

您也可以一次性完成，首先进行排序并使用 head 获取每组的前 3 个。

In[34]: df.sort_values(['job','count'],ascending=False).groupby('job').head(3)

Out[35]: 
   count     job source
4      7   sales      E
2      6   sales      C
1      4   sales      B
5      5  market      A
8      4  market      D
6      3  market      B

Answer 3

回答by Surya

Here's other example of taking top 3 on sorted order, and sorting within the groups:

以下是按排序顺序取前 3 名并在组内排序的其他示例：

In [43]: import pandas as pd                                                                                                                                                       

In [44]:  df = pd.DataFrame({"name":["Foo", "Foo", "Baar", "Foo", "Baar", "Foo", "Baar", "Baar"], "count_1":[5,10,12,15,20,25,30,35], "count_2" :[100,150,100,25,250,300,400,500]})

In [45]: df                                                                                                                                                                        
Out[45]: 
   count_1  count_2  name
0        5      100   Foo
1       10      150   Foo
2       12      100  Baar
3       15       25   Foo
4       20      250  Baar
5       25      300   Foo
6       30      400  Baar
7       35      500  Baar


### Top 3 on sorted order:
In [46]: df.groupby(["name"])["count_1"].nlargest(3)                                                                                                                               
Out[46]: 
name   
Baar  7    35
      6    30
      4    20
Foo   5    25
      3    15
      1    10
dtype: int64


### Sorting within groups based on column "count_1":
In [48]: df.groupby(["name"]).apply(lambda x: x.sort_values(["count_1"], ascending = False)).reset_index(drop=True)
Out[48]: 
   count_1  count_2  name
0       35      500  Baar
1       30      400  Baar
2       20      250  Baar
3       12      100  Baar
4       25      300   Foo
5       15       25   Foo
6       10      150   Foo
7        5      100   Foo

Answer 4

回答by Ted Petrou

If you don't need to sum a column, then use @tvashtar's answer. If you do need to sum, then you can use @joris' answer or this one which is very similar to it.

如果您不需要对一列求和，请使用@tvashtar 的答案。如果您确实需要求和，那么您可以使用@joris 的答案或与之非常相似的答案。

df.groupby(['job']).apply(lambda x: (x.groupby('source')
                                      .sum()
                                      .sort_values('count', ascending=False))
                                     .head(3))

Answer 5

回答by SSCSWAPNIL

Try this Instead

试试这个

simple way to do 'groupby' and sorting in descending order

执行“groupby”并按降序排序的简单方法

df.groupby(['companyName'])['overallRating'].sum().sort_values(ascending=False).head(20)

Python pandas groupby 在组内排序

提问by JoeDanger

采纳答案by joris

回答by tvashtar

回答by Surya

回答by Ted Petrou

回答by SSCSWAPNIL

Try this Instead

试试这个

simple way to do 'groupby' and sorting in descending order

执行“groupby”并按降序排序的简单方法

相关推荐

最近更新

标签

Python pandas groupby 在组内排序

提问by JoeDanger

采纳答案by joris

回答by tvashtar

回答by Surya

回答by Ted Petrou

回答by SSCSWAPNIL

Try this Instead

试试这个

simple way to do 'groupby' and sorting in descending order

执行“groupby”并按降序排序的简单方法

相关推荐

Python dateutil.parser.parse 首先解析月，而不是日

Python 中字符串的所有排列（递归）

Python：如何停止一个函数？

当父级不从对象继承时，Python 2.x super __init__ 继承不起作用

相关推荐

最近更新

标签

当父级不从对象继承时，Python 2.x super init 继承不起作用