使用 Pandas 进行自定义排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23279238/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Custom sorting with Pandas
提问by Blark
I have the following dataframe that I would like to sort first by Criticality and then by Name:
我有以下数据框,我想先按重要性排序,然后按名称排序:
Name Criticality
baz High
foo Critical
baz Low
foo Medium
bar High
bar Low
bar Medium
...
I've been trying to do this using the answer provided in this postbut I just can't get it to work.
我一直在尝试使用这篇文章中提供的答案来做到这一点,但我无法让它发挥作用。
The end result should be like this
最终结果应该是这样的
Name Criticality
bar High
bar Medium
bar Low
baz High
baz Low
foo Critical
foo Medium
回答by EdChum
One approach would be to use a custom dict to create a 'rank' column, we then use to sort with and then drop the column after sorting:
一种方法是使用自定义 dict 创建一个“排名”列,然后我们使用排序,然后在排序后删除该列:
In [17]:
custom_dict = {'Critical':0, 'High':1, 'Medium':2, 'Low':3}
df['rank'] = df['Criticality'].map(custom_dict)
df
Out[17]:
Name Criticality rank
0 baz High 1
1 foo Critical 0
2 baz Low 3
3 foo Medium 2
4 bar High 1
5 bar Low 3
6 bar Medium 2
[7 rows x 3 columns]
In [19]:
# now sort by 'Name' and 'rank', it will first sort by 'Name' column first and then 'rank'
df.sort(columns=['Name', 'rank'],inplace=True)
df
Out[19]:
Name Criticality rank
4 bar High 1
6 bar Medium 2
5 bar Low 3
0 baz High 1
2 baz Low 3
1 foo Critical 0
3 foo Medium 2
[7 rows x 3 columns]
In [21]:
# now drop the 'rank' column
df.drop(labels=['rank'],axis=1)
Out[21]:
Name Criticality
4 bar High
6 bar Medium
5 bar Low
0 baz High
2 baz Low
1 foo Critical
3 foo Medium
[7 rows x 2 columns]
回答by user5843090
I works for me using pd.Categorical
我使用 pd.Categorical 对我来说有效
In [114]: df = pd.DataFrame({
'Name' : ["baz","foo","baz","foo","bar","bar","bar"],
'Criticality' : ["hi", "crt", "lo", "med", "hi", "lo", "med"]
})
...: df['Criticality'] = pd.Categorical(df['Criticality'], ["crt","hi", "med", "lo"])
...: df.sort_values(['Name','Criticality'])
Out[114]:
Name Criticality
4 bar hi
6 bar med
5 bar lo
0 baz hi
2 baz lo
1 foo crt
3 foo med

