pandas 熊猫分组按降序排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27018622/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 15:37:30  来源:igfitidea点击:

pandas groupby sort descending order

sortingpandas

提问by nbecker

pandas groupby will by default sort. But I'd like to change the sort order. How can I do this?

pandas groupby 将默认排序。但我想更改排序顺序。我怎样才能做到这一点?

I'm guessing that I can't apply a sort method to the returned groupby object.

我猜我不能对返回的 groupby 对象应用排序方法。

回答by szeitlin

Do your groupby, and use reset_index() to make it back into a DataFrame. Then sort.

进行分组,并使用 reset_index() 将其恢复为 DataFrame。然后排序。

grouped = df.groupby('mygroups').sum().reset_index()
grouped.sort_values('mygroups', ascending=False)

回答by JD Long

As of Pandas 0.18 one way to do this is to use the sort_indexmethod of the grouped data.

从 Pandas 0.18 开始,一种方法是使用sort_index分组数据的方法。

Here's an example:

下面是一个例子:

np.random.seed(1)
n=10
df = pd.DataFrame({'mygroups' : np.random.choice(['dogs','cats','cows','chickens'], size=n), 
                   'data' : np.random.randint(1000, size=n)})

grouped = df.groupby('mygroups', sort=False).sum()
grouped.sort_index(ascending=False)
print grouped

data
mygroups      
dogs      1831
chickens  1446
cats       933

As you can see, the groupby column is sorted descending now, indstead of the default which is ascending.

如您所见,groupby 列现在按降序排序,而不是默认升序。

回答by BigTom

Similar to one of the answers above, but try adding .sort_values()to your .groupby()will allow you to change the sort order. If you need to sort on a single column, it would look like this:

类似于上面的答案之一,但尝试添加.sort_values()到您的.groupby()将允许您更改排序顺序。如果您需要对单列进行排序,则如下所示:

df.groupby('group')['id'].count().sort_values(ascending=False)

ascending=Falsewill sort from high to low, the default is to sort from low to high.

ascending=False会从高到低排序,默认是从低到高排序。

*Careful with some of these aggregations. For example .size() and .count() return different values since .size() counts NaNs.

*小心这些聚合中的一些。例如 .size() 和 .count() 返回不同的值,因为 .size() 计算 NaN。

What is the difference between size and count in pandas?

熊猫的大小和数量有什么区别?

回答by Surya

Other instance of preserving the order or sort by descending:

保留顺序或按降序排序的其他实例:

In [97]: import pandas as pd                                                                                                    

In [98]: df = pd.DataFrame({'name':['A','B','C','A','B','C','A','B','C'],'Year':[2003,2002,2001,2003,2002,2001,2003,2002,2001]})

#### Default groupby operation:
In [99]: for each in df.groupby(["Year"]): print each                                                                           
(2001,    Year name
2  2001    C
5  2001    C
8  2001    C)
(2002,    Year name
1  2002    B
4  2002    B
7  2002    B)
(2003,    Year name
0  2003    A
3  2003    A
6  2003    A)

### order preserved:
In [100]: for each in df.groupby(["Year"], sort=False): print each                                                               
(2003,    Year name
0  2003    A
3  2003    A
6  2003    A)
(2002,    Year name
1  2002    B
4  2002    B
7  2002    B)
(2001,    Year name
2  2001    C
5  2001    C
8  2001    C)

In [106]: df.groupby(["Year"], sort=False).apply(lambda x: x.sort_values(["Year"]))                        
Out[106]: 
        Year name
Year             
2003 0  2003    A
     3  2003    A
     6  2003    A
2002 1  2002    B
     4  2002    B
     7  2002    B
2001 2  2001    C
     5  2001    C
     8  2001    C

In [107]: df.groupby(["Year"], sort=False).apply(lambda x: x.sort_values(["Year"])).reset_index(drop=True)
Out[107]: 
   Year name
0  2003    A
1  2003    A
2  2003    A
3  2002    B
4  2002    B
5  2002    B
6  2001    C
7  2001    C
8  2001    C

回答by The Unfun Cat

You can do a sort_values()on the dataframe before you do the groupby. Pandas preserves the ordering in the groupby.

您可以sort_values()在执行 groupby 之前对数据框执行 a 。Pandas 保留了 groupby 中的顺序。

In [44]: d.head(10)
Out[44]:
              name transcript  exon
0  ENST00000456328          2     1
1  ENST00000450305          2     1
2  ENST00000450305          2     2
3  ENST00000450305          2     3
4  ENST00000456328          2     2
5  ENST00000450305          2     4
6  ENST00000450305          2     5
7  ENST00000456328          2     3
8  ENST00000450305          2     6
9  ENST00000488147          1    11

for _, a in d.head(10).sort_values(["transcript", "exon"]).groupby(["name", "transcript"]): print(a)
              name transcript  exon
1  ENST00000450305          2     1
2  ENST00000450305          2     2
3  ENST00000450305          2     3
5  ENST00000450305          2     4
6  ENST00000450305          2     5
8  ENST00000450305          2     6
              name transcript  exon
0  ENST00000456328          2     1
4  ENST00000456328          2     2
7  ENST00000456328          2     3
              name transcript  exon
9  ENST00000488147          1    11

回答by Jim Arnold

This kind of operation is covered under hierarchical indexing. Check out the examples here

这种操作包含在分层索引下。查看此处的示例

When you groupby, you're making new indices. If you also pass a list through .agg(). you'll get multiple columns. I was trying to figure this out and found this thread via google.

当您分组时,您正在创建新索引。如果您还通过 .agg() 传递列表。你会得到多个列。我试图弄清楚这一点,并通过谷歌找到了这个线程。

It turns out if you pass a tuple corresponding to the exact column you want sorted on.

事实证明,如果您传递与要排序的确切列相对应的元组。

Try this:

尝试这个:

# generate toy data 
ex = pd.DataFrame(np.random.randint(1,10,size=(100,3)), columns=['features', 'AUC', 'recall'])

# pass a tuple corresponding to which specific col you want sorted. In this case, 'mean' or 'AUC' alone are not unique. 
ex.groupby('features').agg(['mean','std']).sort_values(('AUC', 'mean'))

This will output a df sorted by the AUC-mean column only.

这将输出仅按 AUC-mean 列排序的 df。