Pandas DataFrame 按分类列排序,但按特定类排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39223256/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:55:05  来源:igfitidea点击:

Pandas DataFrame sort by categorical column but by specific class ordering

python-2.7sortingpandasdataframecategorical-data

提问by elzurdo

I would like to select the top entries in a Pandas dataframe base on the entries of a specific column by using df_selected = df_targets.head(N).

我想根据特定列的条目选择 Pandas 数据框中的顶部条目,方法是使用df_selected = df_targets.head(N).

Each entry has a targetvalue (by order of importance):

每个条目都有一个target值(按重要性排序):

Likely Supporter, GOTV, Persuasion, Persuasion+GOTV  

Unfortunately if I do

不幸的是,如果我这样做

df_targets = df_targets.sort("target")

the ordering will be alphabetical (GOTV,Likely Supporter, ...).

排序将按字母顺序 ( GOTV, Likely Supporter, ...)。

I was hoping for a keyword like list_orderingas in:

我希望有一个像这样的关键字list_ordering

my_list = ["Likely Supporter", "GOTV", "Persuasion", "Persuasion+GOTV"] 
df_targets = df_targets.sort("target", list_ordering=my_list)

To deal with this issue I create a dictionary:

为了解决这个问题,我创建了一个字典:

dict_targets = OrderedDict()
dict_targets["Likely Supporter"] = "0 Likely Supporter"
dict_targets["GOTV"] = "1 GOTV"
dict_targets["Persuasion"] = "2 Persuasion"
dict_targets["Persuasion+GOTV"] = "3 Persuasion+GOTV"

, but it seems like a non-pythonic approach.

,但这似乎是一种非 Pythonic 的方法。

Suggestions would be much appreciated!

建议将不胜感激!

回答by jezrael

I think you need Categoricalwith parameter ordered=Trueand then sorting by sort_valuesworks very nice:

我认为您需要Categorical使用参数ordered=True,然后按sort_values工作方式排序非常好:

If check documentation of Categorical:

如果检查以下文件Categorical

Ordered Categoricalscan be sorted according to the custom order of the categories and can have a min and max value.

Ordered Categoricals可以根据类别的自定义顺序进行排序,并且可以具有最小值和最大值。

import pandas as pd

df = pd.DataFrame({'a': ['GOTV', 'Persuasion', 'Likely Supporter', 
                         'GOTV', 'Persuasion', 'Persuasion+GOTV']})

df.a = pd.Categorical(df.a, 
                      categories=["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"],
                      ordered=True)

print (df)
                  a
0              GOTV
1        Persuasion
2  Likely Supporter
3              GOTV
4        Persuasion
5   Persuasion+GOTV

print (df.a)
0                GOTV
1          Persuasion
2    Likely Supporter
3                GOTV
4          Persuasion
5     Persuasion+GOTV
Name: a, dtype: category
Categories (4, object): [Likely Supporter < GOTV < Persuasion < Persuasion+GOTV]
df.sort_values('a', inplace=True)
print (df)
                  a
2  Likely Supporter
0              GOTV
3              GOTV
1        Persuasion
4        Persuasion
5   Persuasion+GOTV

回答by elzurdo

The method shown in my previous answer is now deprecated.

我之前的答案中显示的方法现已弃用。

In stead it is best to use pandas.Categoricalas shown here.

相反,最好pandas.Categorical按照此处所示使用。

So:

所以:

list_ordering = ["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"]  
df["target"] = pd.Categorical(df["target"], categories=list_ordering) 

回答by elzurdo

Thanks to jerzrael's input and references,

感谢 jerzrael 的输入和参考,

I like this sliced solution:

我喜欢这个切片解决方案:

list_ordering = ["Likely Supporter","GOTV","Persuasion","Persuasion+GOTV"]  

df["target"] = df["target"].astype("category", categories=list_ordering, ordered=True)