pandas 熊猫分组按降序排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27018622/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas groupby sort descending order
提问by nbecker
pandas groupby will by default sort. But I'd like to change the sort order. How can I do this?
pandas groupby 将默认排序。但我想更改排序顺序。我怎样才能做到这一点?
I'm guessing that I can't apply a sort method to the returned groupby object.
我猜我不能对返回的 groupby 对象应用排序方法。
回答by szeitlin
Do your groupby, and use reset_index() to make it back into a DataFrame. Then sort.
进行分组,并使用 reset_index() 将其恢复为 DataFrame。然后排序。
grouped = df.groupby('mygroups').sum().reset_index()
grouped.sort_values('mygroups', ascending=False)
回答by JD Long
As of Pandas 0.18 one way to do this is to use the sort_index
method of the grouped data.
从 Pandas 0.18 开始,一种方法是使用sort_index
分组数据的方法。
Here's an example:
下面是一个例子:
np.random.seed(1)
n=10
df = pd.DataFrame({'mygroups' : np.random.choice(['dogs','cats','cows','chickens'], size=n),
'data' : np.random.randint(1000, size=n)})
grouped = df.groupby('mygroups', sort=False).sum()
grouped.sort_index(ascending=False)
print grouped
data
mygroups
dogs 1831
chickens 1446
cats 933
As you can see, the groupby column is sorted descending now, indstead of the default which is ascending.
如您所见,groupby 列现在按降序排序,而不是默认升序。
回答by BigTom
Similar to one of the answers above, but try adding .sort_values()
to your .groupby()
will allow you to change the sort order. If you need to sort on a single column, it would look like this:
类似于上面的答案之一,但尝试添加.sort_values()
到您的.groupby()
将允许您更改排序顺序。如果您需要对单列进行排序,则如下所示:
df.groupby('group')['id'].count().sort_values(ascending=False)
ascending=False
will sort from high to low, the default is to sort from low to high.
ascending=False
会从高到低排序,默认是从低到高排序。
*Careful with some of these aggregations. For example .size() and .count() return different values since .size() counts NaNs.
*小心这些聚合中的一些。例如 .size() 和 .count() 返回不同的值,因为 .size() 计算 NaN。
回答by Surya
Other instance of preserving the order or sort by descending:
保留顺序或按降序排序的其他实例:
In [97]: import pandas as pd
In [98]: df = pd.DataFrame({'name':['A','B','C','A','B','C','A','B','C'],'Year':[2003,2002,2001,2003,2002,2001,2003,2002,2001]})
#### Default groupby operation:
In [99]: for each in df.groupby(["Year"]): print each
(2001, Year name
2 2001 C
5 2001 C
8 2001 C)
(2002, Year name
1 2002 B
4 2002 B
7 2002 B)
(2003, Year name
0 2003 A
3 2003 A
6 2003 A)
### order preserved:
In [100]: for each in df.groupby(["Year"], sort=False): print each
(2003, Year name
0 2003 A
3 2003 A
6 2003 A)
(2002, Year name
1 2002 B
4 2002 B
7 2002 B)
(2001, Year name
2 2001 C
5 2001 C
8 2001 C)
In [106]: df.groupby(["Year"], sort=False).apply(lambda x: x.sort_values(["Year"]))
Out[106]:
Year name
Year
2003 0 2003 A
3 2003 A
6 2003 A
2002 1 2002 B
4 2002 B
7 2002 B
2001 2 2001 C
5 2001 C
8 2001 C
In [107]: df.groupby(["Year"], sort=False).apply(lambda x: x.sort_values(["Year"])).reset_index(drop=True)
Out[107]:
Year name
0 2003 A
1 2003 A
2 2003 A
3 2002 B
4 2002 B
5 2002 B
6 2001 C
7 2001 C
8 2001 C
回答by The Unfun Cat
You can do a sort_values()
on the dataframe before you do the groupby. Pandas preserves the ordering in the groupby.
您可以sort_values()
在执行 groupby 之前对数据框执行 a 。Pandas 保留了 groupby 中的顺序。
In [44]: d.head(10)
Out[44]:
name transcript exon
0 ENST00000456328 2 1
1 ENST00000450305 2 1
2 ENST00000450305 2 2
3 ENST00000450305 2 3
4 ENST00000456328 2 2
5 ENST00000450305 2 4
6 ENST00000450305 2 5
7 ENST00000456328 2 3
8 ENST00000450305 2 6
9 ENST00000488147 1 11
for _, a in d.head(10).sort_values(["transcript", "exon"]).groupby(["name", "transcript"]): print(a)
name transcript exon
1 ENST00000450305 2 1
2 ENST00000450305 2 2
3 ENST00000450305 2 3
5 ENST00000450305 2 4
6 ENST00000450305 2 5
8 ENST00000450305 2 6
name transcript exon
0 ENST00000456328 2 1
4 ENST00000456328 2 2
7 ENST00000456328 2 3
name transcript exon
9 ENST00000488147 1 11
回答by Jim Arnold
This kind of operation is covered under hierarchical indexing. Check out the examples here
这种操作包含在分层索引下。查看此处的示例
When you groupby, you're making new indices. If you also pass a list through .agg(). you'll get multiple columns. I was trying to figure this out and found this thread via google.
当您分组时,您正在创建新索引。如果您还通过 .agg() 传递列表。你会得到多个列。我试图弄清楚这一点,并通过谷歌找到了这个线程。
It turns out if you pass a tuple corresponding to the exact column you want sorted on.
事实证明,如果您传递与要排序的确切列相对应的元组。
Try this:
尝试这个:
# generate toy data
ex = pd.DataFrame(np.random.randint(1,10,size=(100,3)), columns=['features', 'AUC', 'recall'])
# pass a tuple corresponding to which specific col you want sorted. In this case, 'mean' or 'AUC' alone are not unique.
ex.groupby('features').agg(['mean','std']).sort_values(('AUC', 'mean'))
This will output a df sorted by the AUC-mean column only.
这将输出仅按 AUC-mean 列排序的 df。