pandas 按字符串长度对数据框进行排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42516616/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:06:01  来源:igfitidea点击:

Sort dataframe by string length

pythonpandassortingseriesreindex

提问by AlexG

I want to sort by name length. There doesn't appear to be a keyparameter for sort_valuesso I'm not sure how to accomplish this. Here is a test df:

我想按名称长度排序。似乎没有key参数,sort_values所以我不确定如何完成此操作。这是一个测试df:

import pandas as pd
df = pd.DataFrame({'name': ['Steve', 'Al', 'Markus', 'Greg'], 'score': [2, 4, 2, 3]})

回答by jezrael

You can use reindexof indexof Seriescreated by lenwith sort_values:

您可以使用reindexindexSeries通过创建len具有sort_values

print (df.name.str.len())
0    5
1    2
2    6
3    4
Name: name, dtype: int64

print (df.name.str.len().sort_values())
1    2
3    4
0    5
2    6
Name: name, dtype: int64

s = df.name.str.len().sort_values().index
print (s)
Int64Index([1, 3, 0, 2], dtype='int64')

print (df.reindex(s))
     name  score
1      Al      4
3    Greg      3
0   Steve      2
2  Markus      2


df1 = df.reindex(s)
df1 = df1.reset_index(drop=True)
print (df1)
     name  score
0      Al      4
1    Greg      3
2   Steve      2
3  Markus      2

回答by moshfiqur

I found this solution more intuitive, specially if you want to do something depending on the column length later on.

我发现这个解决方案更直观,特别是如果你以后想根据列的长度做一些事情。

df['length'] = df['name'].str.len()
df.sort_values('length', ascending=False, inplace=True)

Now your dataframe will have a column with name lengthwith the value of string length from column namein it and the whole dataframe will be sorted in descending order.

现在您的数据框将有一个名称为列length的字符串长度值的列name,整个数据框将按降序排序。

回答by Thierry G.

The answer of @jezrael is great and explains well. Here is the final result :

@jezrael 的回答很棒并且解释得很好。这是最终结果:

index_sorted = df.name.str.len().sort_values(ascending=True).index
df_sorted = df.reindex(index_sorted)
df_sorted = df_sorted.reset_index(drop=True)