按列表顺序对 Pandas DataFrame 进行排序

Question

提问by Wes Field

So I have a pandas DataFrame, df, with columns that represent taxonomical classification (i.e. Kingdom, Phylum, Class etc...) I also have a list of taxonomic labels that correspond to the order I would like the DataFrame to be ordered by.

所以我有一个 Pandas DataFrame, df，其中的列代表分类学分类（即 Kingdom、Phylum、Class 等...）我还有一个分类标签列表，这些标签对应于我希望 DataFrame 排序的顺序。

The list looks something like this:

该列表如下所示：

class_list=['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes', 'Clostridia', 'Bacilli', 'Actinobacteria', 'Betaproteobacteria', 'delta/epsilon subdivisions', 'Synergistia', 'Mollicutes', 'Nitrospira', 'Spirochaetia', 'Thermotogae', 'Aquificae', 'Fimbriimonas', 'Gemmatimonadetes', 'Dehalococcoidia', 'Oscillatoriophycideae', 'Chlamydiae', 'Nostocales', 'Thermodesulfobacteria', 'Erysipelotrichia', 'Chlorobi', 'Deinococci']

This list would correspond to the Dataframecolumn df['Class']. I would like to sort all the rows for the whole dataframe based on the order of the list as df['Class']is in a different order currently. What would be the best way to do this?

此列表将对应于Dataframe列df['Class']。我想根据列表的顺序对整个数据帧的所有行进行排序，因为当前的顺序df['Class']不同。什么是最好的方法来做到这一点？

Answer 1

回答by Alex Riley

You could make the Classcolumn your index column

您可以将该Class列设为索引列

df = df.set_index('Class')

and then use df.locto reindex the DataFrame with class_list:

然后使用以下命令df.loc重新索引 DataFrame class_list：

df.loc[class_list]

Minimal example:

最小的例子：

>>> df = pd.DataFrame({'Class': ['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes'], 'Number': [3, 5, 6]})
>>> df
                 Class  Number
0  Gammaproteobacteria       3
1        Bacteroidetes       5
2        Negativicutes       6

>>> df = df.set_index('Class')
>>> df.loc[['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']]
                     Number
Bacteroidetes             5
Negativicutes             6
Gammaproteobacteria       3

Answer 2

回答by jarvis

Alex's solution doesn't work if your original dataframe does not contain all of the elements in the ordered list i.e.: if your input data at some point in timedoes not contain "Negativicutes", this script will fail. One way to get past this is to append your df's in a list and concatenate them at the end. For example:

如果您的原始数据框不包含有序列表中的所有元素，Alex 的解决方案将不起作用，即：如果您在某个时间点的输入数据不包含“Negativicutes”，则此脚本将失败。解决这个问题的一种方法是将您的 df 附加到列表中，并在最后将它们连接起来。例如：

ordered_classes = ['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']

df_list = []

for i in ordered_classes:
   df_list.append(df[df['Class']==i)

ordered_df = pd.concat(df_list)

按列表顺序对 Pandas DataFrame 进行排序

提问by Wes Field

回答by Alex Riley

回答by jarvis

相关推荐

最近更新

标签

按列表顺序对 Pandas DataFrame 进行排序

提问by Wes Field

回答by Alex Riley

回答by jarvis

相关推荐

为什么我得到只有一列与系列的 Pandas 数据框？

pandas 中日期时间索引的算术运算

在打开的文件上使用 Pandas read_csv() 两次

pandas 将熊猫系列从字符串转换为唯一的 int id

相关推荐

最近更新

标签