按列表顺序对 Pandas DataFrame 进行排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26202926/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:32:35  来源:igfitidea点击:

Sorting a pandas DataFrame by the order of a list

sortingpython-2.7pandasdataframe

提问by Wes Field

So I have a pandas DataFrame, df, with columns that represent taxonomical classification (i.e. Kingdom, Phylum, Class etc...) I also have a list of taxonomic labels that correspond to the order I would like the DataFrame to be ordered by.

所以我有一个 Pandas DataFrame, df,其中的列代表分类学分类(即 Kingdom、Phylum、Class 等...)我还有一个分类标签列表,这些标签对应于我希望 DataFrame 排序的顺序。

The list looks something like this:

该列表如下所示:

class_list=['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes', 'Clostridia', 'Bacilli', 'Actinobacteria', 'Betaproteobacteria', 'delta/epsilon subdivisions', 'Synergistia', 'Mollicutes', 'Nitrospira', 'Spirochaetia', 'Thermotogae', 'Aquificae', 'Fimbriimonas', 'Gemmatimonadetes', 'Dehalococcoidia', 'Oscillatoriophycideae', 'Chlamydiae', 'Nostocales', 'Thermodesulfobacteria', 'Erysipelotrichia', 'Chlorobi', 'Deinococci']

This list would correspond to the Dataframecolumn df['Class']. I would like to sort all the rows for the whole dataframe based on the order of the list as df['Class']is in a different order currently. What would be the best way to do this?

此列表将对应于Dataframedf['Class']。我想根据列表的顺序对整个数据帧的所有行进行排序,因为当前的顺序df['Class']不同。什么是最好的方法来做到这一点?

回答by Alex Riley

You could make the Classcolumn your index column

您可以将该Class列设为索引列

df = df.set_index('Class')

and then use df.locto reindex the DataFrame with class_list:

然后使用以下命令df.loc重新索引 DataFrame class_list

df.loc[class_list]

Minimal example:

最小的例子:

>>> df = pd.DataFrame({'Class': ['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes'], 'Number': [3, 5, 6]})
>>> df
                 Class  Number
0  Gammaproteobacteria       3
1        Bacteroidetes       5
2        Negativicutes       6

>>> df = df.set_index('Class')
>>> df.loc[['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']]
                     Number
Bacteroidetes             5
Negativicutes             6
Gammaproteobacteria       3

回答by jarvis

Alex's solution doesn't work if your original dataframe does not contain all of the elements in the ordered list i.e.: if your input data at some point in timedoes not contain "Negativicutes", this script will fail. One way to get past this is to append your df's in a list and concatenate them at the end. For example:

如果您的原始数据框不包含有序列表中的所有元素,Alex 的解决方案将不起作用,即:如果您在某个时间点的输入数据不包含“Negativicutes”,则此脚本将失败。解决这个问题的一种方法是将您的 df 附加到列表中,并在最后将它们连接起来。例如:

ordered_classes = ['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']

df_list = []

for i in ordered_classes:
   df_list.append(df[df['Class']==i)

ordered_df = pd.concat(df_list)