pandas 对具有字符串和数字的 DataFrame 索引进行排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23493374/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sort DataFrame index that has a string and number
提问by Plug4
My dfDataFrame index looks like this:
我的dfDataFrame 索引如下所示:
Com_Lag_01
Com_Lag_02
Com_Lag_03
Com_Lag_04
Com_Lag_05
Com_Lag_06
Com_Lag_07
Com_Lag_08
Com_Lag_09
Com_Lag_10
Com_Lag_101
Com_Lag_102
Com_Lag_103
...
Com_Lag_11
Com_Lag_111
Com_Lag_112
Com_Lag_113
Com_Lag_114
...
Com_Lag_12
Com_Lag_120
...
Com_Lag_13
Com_Lag_14
Com_Lag_15
I want to sort this index so that the numbers go from Com_Lag_1to Com_Lag_120. If I use df.sort_index()I will get the same thing as above. Any suggestion on how to sort this index properly?
我想对这个索引进行排序,以便数字从Com_Lag_1到Com_Lag_120。如果我使用,df.sort_index()我会得到与上面相同的东西。关于如何正确排序此索引的任何建议?
回答by Guillaume Jacquenot
One could try something like this, by performing a sort on a numbered version of the index
人们可以尝试这样的事情,通过对索引的编号版本执行排序
import pandas as pd
# Create a DataFrame example
df = pd.DataFrame(\
{'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
'Age': [27 ,25 ,22 ,34 ,31],\
},\
index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_5'])
# Add of a column containing a numbered version of the index
df['indexNumber'] = [int(i.split('_')[-1]) for i in df.index]
# Perform sort of the rows
df.sort(['indexNumber'], ascending = [True], inplace = True)
# Deletion of the added column
df.drop('indexNumber', 1, inplace = True)
Edit 2017 - V1:
编辑 2017 - V1:
To avoid SettingWithCopyWarning:
为避免 SettingWithCopyWarning:
df = df.assign(indexNumber=[int(i.split('_')[-1]) for i in df.index])
Edit 2017 - V2 for Pandas Version 0.21.0
编辑 2017 - V2 for Pandas 版本 0.21.0
import pandas as pd
print(pd.__version__)
# Create a DataFrame example
df = pd.DataFrame(\
{'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
'Age': [27 ,25 ,22 ,34 ,31],\
},\
index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_5'])
df.reindex(index=df.index.to_series().str.rsplit('_').str[-1].astype(int).sort_values().index)
回答by jezrael
Solution without new column with DataFrame.reindexby indexof sorted Series:
没有新列的解决方案与DataFrame.reindex通过index的排序Series:
a = df.index.to_series().str.rsplit('_').str[-1].astype(int).sort_values()
print (a)
Com_Lag_1 1
Com_Lag_3 3
Com_Lag_5 5
Com_Lag_12 12
Com_Lag_24 24
dtype: int32
df = df.reindex(index=a.index)
print (df)
Age Year
Com_Lag_1 27 1991
Com_Lag_3 22 2001
Com_Lag_5 31 1997
Com_Lag_12 25 2004
Com_Lag_24 34 2009
But if duplicated values is necessary add new column:
但如果需要重复值,请添加新列:
df = pd.DataFrame(\
{'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
'Age': [27 ,25 ,22 ,34 ,31],\
},\
index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_12'])
print (df)
Age Year
Com_Lag_1 27 1991
Com_Lag_12 25 2004
Com_Lag_3 22 2001
Com_Lag_24 34 2009
Com_Lag_12 31 1997
df['indexNumber'] = df.index.str.rsplit('_').str[-1].astype(int)
df = df.sort_values(['indexNumber']).drop('indexNumber', axis=1)
print (df)
Age Year
Com_Lag_1 27 1991
Com_Lag_3 22 2001
Com_Lag_12 25 2004
Com_Lag_12 31 1997
Com_Lag_24 34 2009

