pandas 熊猫分组按降序排序

Question

提问by nbecker

pandas groupby will by default sort. But I'd like to change the sort order. How can I do this?

pandas groupby 将默认排序。但我想更改排序顺序。我怎样才能做到这一点？

I'm guessing that I can't apply a sort method to the returned groupby object.

我猜我不能对返回的 groupby 对象应用排序方法。

Answer 1

回答by szeitlin

Do your groupby, and use reset_index() to make it back into a DataFrame. Then sort.

进行分组，并使用 reset_index() 将其恢复为 DataFrame。然后排序。

grouped = df.groupby('mygroups').sum().reset_index()
grouped.sort_values('mygroups', ascending=False)

Answer 2

回答by JD Long

As of Pandas 0.18 one way to do this is to use the sort_indexmethod of the grouped data.

从 Pandas 0.18 开始，一种方法是使用sort_index分组数据的方法。

Here's an example:

下面是一个例子：

np.random.seed(1)
n=10
df = pd.DataFrame({'mygroups' : np.random.choice(['dogs','cats','cows','chickens'], size=n), 
                   'data' : np.random.randint(1000, size=n)})

grouped = df.groupby('mygroups', sort=False).sum()
grouped.sort_index(ascending=False)
print grouped

data
mygroups      
dogs      1831
chickens  1446
cats       933

As you can see, the groupby column is sorted descending now, indstead of the default which is ascending.

如您所见，groupby 列现在按降序排序，而不是默认升序。

Answer 3

回答by BigTom

Similar to one of the answers above, but try adding .sort_values()to your .groupby()will allow you to change the sort order. If you need to sort on a single column, it would look like this:

类似于上面的答案之一，但尝试添加.sort_values()到您的.groupby()将允许您更改排序顺序。如果您需要对单列进行排序，则如下所示：

df.groupby('group')['id'].count().sort_values(ascending=False)

ascending=Falsewill sort from high to low, the default is to sort from low to high.

ascending=False会从高到低排序，默认是从低到高排序。

*Careful with some of these aggregations. For example .size() and .count() return different values since .size() counts NaNs.

*小心这些聚合中的一些。例如 .size() 和 .count() 返回不同的值，因为 .size() 计算 NaN。

What is the difference between size and count in pandas?

熊猫的大小和数量有什么区别？

Answer 4

回答by Surya

Other instance of preserving the order or sort by descending:

保留顺序或按降序排序的其他实例：

In [97]: import pandas as pd                                                                                                    

In [98]: df = pd.DataFrame({'name':['A','B','C','A','B','C','A','B','C'],'Year':[2003,2002,2001,2003,2002,2001,2003,2002,2001]})

#### Default groupby operation:
In [99]: for each in df.groupby(["Year"]): print each                                                                           
(2001,    Year name
2  2001    C
5  2001    C
8  2001    C)
(2002,    Year name
1  2002    B
4  2002    B
7  2002    B)
(2003,    Year name
0  2003    A
3  2003    A
6  2003    A)

### order preserved:
In [100]: for each in df.groupby(["Year"], sort=False): print each                                                               
(2003,    Year name
0  2003    A
3  2003    A
6  2003    A)
(2002,    Year name
1  2002    B
4  2002    B
7  2002    B)
(2001,    Year name
2  2001    C
5  2001    C
8  2001    C)

In [106]: df.groupby(["Year"], sort=False).apply(lambda x: x.sort_values(["Year"]))                        
Out[106]: 
        Year name
Year             
2003 0  2003    A
     3  2003    A
     6  2003    A
2002 1  2002    B
     4  2002    B
     7  2002    B
2001 2  2001    C
     5  2001    C
     8  2001    C

In [107]: df.groupby(["Year"], sort=False).apply(lambda x: x.sort_values(["Year"])).reset_index(drop=True)
Out[107]: 
   Year name
0  2003    A
1  2003    A
2  2003    A
3  2002    B
4  2002    B
5  2002    B
6  2001    C
7  2001    C
8  2001    C

Answer 5

回答by The Unfun Cat

You can do a sort_values()on the dataframe before you do the groupby. Pandas preserves the ordering in the groupby.

您可以sort_values()在执行 groupby 之前对数据框执行 a 。Pandas 保留了 groupby 中的顺序。

In [44]: d.head(10)
Out[44]:
              name transcript  exon
0  ENST00000456328          2     1
1  ENST00000450305          2     1
2  ENST00000450305          2     2
3  ENST00000450305          2     3
4  ENST00000456328          2     2
5  ENST00000450305          2     4
6  ENST00000450305          2     5
7  ENST00000456328          2     3
8  ENST00000450305          2     6
9  ENST00000488147          1    11

for _, a in d.head(10).sort_values(["transcript", "exon"]).groupby(["name", "transcript"]): print(a)
              name transcript  exon
1  ENST00000450305          2     1
2  ENST00000450305          2     2
3  ENST00000450305          2     3
5  ENST00000450305          2     4
6  ENST00000450305          2     5
8  ENST00000450305          2     6
              name transcript  exon
0  ENST00000456328          2     1
4  ENST00000456328          2     2
7  ENST00000456328          2     3
              name transcript  exon
9  ENST00000488147          1    11

Answer 6

回答by Jim Arnold

This kind of operation is covered under hierarchical indexing. Check out the examples here

这种操作包含在分层索引下。查看此处的示例

When you groupby, you're making new indices. If you also pass a list through .agg(). you'll get multiple columns. I was trying to figure this out and found this thread via google.

当您分组时，您正在创建新索引。如果您还通过 .agg() 传递列表。你会得到多个列。我试图弄清楚这一点，并通过谷歌找到了这个线程。

It turns out if you pass a tuple corresponding to the exact column you want sorted on.

事实证明，如果您传递与要排序的确切列相对应的元组。

Try this:

尝试这个：

# generate toy data 
ex = pd.DataFrame(np.random.randint(1,10,size=(100,3)), columns=['features', 'AUC', 'recall'])

# pass a tuple corresponding to which specific col you want sorted. In this case, 'mean' or 'AUC' alone are not unique. 
ex.groupby('features').agg(['mean','std']).sort_values(('AUC', 'mean'))

This will output a df sorted by the AUC-mean column only.

这将输出仅按 AUC-mean 列排序的 df。

pandas 熊猫分组按降序排序

提问by nbecker

回答by szeitlin

回答by JD Long

回答by BigTom

回答by Surya

回答by The Unfun Cat

回答by Jim Arnold

相关推荐

最近更新

标签

pandas 熊猫分组按降序排序

提问by nbecker

回答by szeitlin

回答by JD Long

回答by BigTom

回答by Surya

回答by The Unfun Cat

回答by Jim Arnold

相关推荐

如何在 VBA 中使用两个变量选择一系列行

VBA 检查文件是否存在

如何使 Excel VBA 变量可用于多个宏？

vba 在Excel中选择ActiveCell行的第1到10列

相关推荐

最近更新

标签