使用 Pandas 进行自定义排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23279238/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:57:46  来源:igfitidea点击:

Custom sorting with Pandas

pythonsortingpandas

提问by Blark

I have the following dataframe that I would like to sort first by Criticality and then by Name:

我有以下数据框,我想先按重要性排序,然后按名称排序:

Name        Criticality
baz         High
foo         Critical
baz         Low
foo         Medium
bar         High
bar         Low
bar         Medium
...

I've been trying to do this using the answer provided in this postbut I just can't get it to work.

我一直在尝试使用这篇文章中提供的答案来做到这一点,但我无法让它发挥作用。

The end result should be like this

最终结果应该是这样的

Name        Criticality
bar         High
bar         Medium
bar         Low
baz         High
baz         Low
foo         Critical
foo         Medium

回答by EdChum

One approach would be to use a custom dict to create a 'rank' column, we then use to sort with and then drop the column after sorting:

一种方法是使用自定义 dict 创建一个“排名”列,然后我们使用排序,然后在排序后删除该列:

In [17]:
custom_dict = {'Critical':0, 'High':1, 'Medium':2, 'Low':3}  
df['rank'] = df['Criticality'].map(custom_dict)
df

Out[17]:

  Name Criticality  rank
0  baz        High     1
1  foo    Critical     0
2  baz         Low     3
3  foo      Medium     2
4  bar        High     1
5  bar         Low     3
6  bar      Medium     2

[7 rows x 3 columns]

In [19]:
# now sort by 'Name' and 'rank', it will first sort by 'Name' column first and then 'rank'
df.sort(columns=['Name', 'rank'],inplace=True)
df

Out[19]:

  Name Criticality  rank
4  bar        High     1
6  bar      Medium     2
5  bar         Low     3
0  baz        High     1
2  baz         Low     3
1  foo    Critical     0
3  foo      Medium     2

[7 rows x 3 columns]

In [21]:
# now drop the 'rank' column
df.drop(labels=['rank'],axis=1)

Out[21]:

  Name Criticality
4  bar        High
6  bar      Medium
5  bar         Low
0  baz        High
2  baz         Low
1  foo    Critical
3  foo      Medium

[7 rows x 2 columns]

回答by user5843090

I works for me using pd.Categorical

我使用 pd.Categorical 对我来说有效

In [114]: df = pd.DataFrame({
          'Name' : ["baz","foo","baz","foo","bar","bar","bar"],
          'Criticality' : ["hi", "crt", "lo", "med", "hi", "lo", "med"]
          })

     ...: df['Criticality'] = pd.Categorical(df['Criticality'], ["crt","hi", "med", "lo"])

     ...: df.sort_values(['Name','Criticality'])
Out[114]: 
  Name Criticality
4  bar          hi
6  bar         med
5  bar          lo
0  baz          hi
2  baz          lo
1  foo         crt
3  foo         med