按列表顺序对 Pandas DataFrame 进行排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26202926/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sorting a pandas DataFrame by the order of a list
提问by Wes Field
So I have a pandas DataFrame, df, with columns that represent taxonomical classification (i.e. Kingdom, Phylum, Class etc...) I also have a list of taxonomic labels that correspond to the order I would like the DataFrame to be ordered by.
所以我有一个 Pandas DataFrame, df,其中的列代表分类学分类(即 Kingdom、Phylum、Class 等...)我还有一个分类标签列表,这些标签对应于我希望 DataFrame 排序的顺序。
The list looks something like this:
该列表如下所示:
class_list=['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes', 'Clostridia', 'Bacilli', 'Actinobacteria', 'Betaproteobacteria', 'delta/epsilon subdivisions', 'Synergistia', 'Mollicutes', 'Nitrospira', 'Spirochaetia', 'Thermotogae', 'Aquificae', 'Fimbriimonas', 'Gemmatimonadetes', 'Dehalococcoidia', 'Oscillatoriophycideae', 'Chlamydiae', 'Nostocales', 'Thermodesulfobacteria', 'Erysipelotrichia', 'Chlorobi', 'Deinococci']
This list would correspond to the Dataframecolumn df['Class']. I would like to sort all the rows for the whole dataframe based on the order of the list as df['Class']is in a different order currently. What would be the best way to do this?
此列表将对应于Dataframe列df['Class']。我想根据列表的顺序对整个数据帧的所有行进行排序,因为当前的顺序df['Class']不同。什么是最好的方法来做到这一点?
回答by Alex Riley
You could make the Classcolumn your index column
您可以将该Class列设为索引列
df = df.set_index('Class')
and then use df.locto reindex the DataFrame with class_list:
然后使用以下命令df.loc重新索引 DataFrame class_list:
df.loc[class_list]
Minimal example:
最小的例子:
>>> df = pd.DataFrame({'Class': ['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes'], 'Number': [3, 5, 6]})
>>> df
Class Number
0 Gammaproteobacteria 3
1 Bacteroidetes 5
2 Negativicutes 6
>>> df = df.set_index('Class')
>>> df.loc[['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']]
Number
Bacteroidetes 5
Negativicutes 6
Gammaproteobacteria 3
回答by jarvis
Alex's solution doesn't work if your original dataframe does not contain all of the elements in the ordered list i.e.: if your input data at some point in timedoes not contain "Negativicutes", this script will fail. One way to get past this is to append your df's in a list and concatenate them at the end. For example:
如果您的原始数据框不包含有序列表中的所有元素,Alex 的解决方案将不起作用,即:如果您在某个时间点的输入数据不包含“Negativicutes”,则此脚本将失败。解决这个问题的一种方法是将您的 df 附加到列表中,并在最后将它们连接起来。例如:
ordered_classes = ['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']
df_list = []
for i in ordered_classes:
df_list.append(df[df['Class']==i)
ordered_df = pd.concat(df_list)

