Pandas sort_values 不能正确排序数字

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47914274/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:55:46  来源:igfitidea点击:

Pandas sort_values does not sort numbers correctly

pythonpandassortingdataframe

提问by Newkid

I'm new to pandas and working with tabular data in a programming environment. I have sorted a dataframe by a specific column but the answer that panda spits out is not exactly correct.

我是 Pandas 的新手,在编程环境中使用表格数据。我已经按特定列对数据框进行了排序,但Pandas吐出的答案并不完全正确。

Here is the code I have used:

这是我使用的代码:

league_dataframe.sort_values('overall_league_position')

The result that the sort method yields values in column 'overall league position' are not sorted in ascending or order which is the default for the method.

sort 方法在列“整体联赛排名”中产生值的结果不是按升序或顺序排序,这是该方法的默认值。

enter image description here

在此处输入图片说明

What am I doing wrong? Thanks for your patience!

我究竟做错了什么?谢谢你的耐心!

回答by cs95

For whatever reason, you seem to be working with a column of strings, and sort_valuesis returning you a lexsorted result.

无论出于何种原因,您似乎正在处理一列字符串,并sort_values返回一个词法排序结果。

Here's an example.

这是一个例子。

df = pd.DataFrame({"Col": ['1', '2', '3', '10', '20', '19']})
df

  Col
0   1
1   2
2   3
3  10
4  20
5  19

df.sort_values('Col')

  Col
0   1
3  10
5  19
1   2
4  20
2   3

The remedy is to convert it to numeric, either using .astypeor pd.to_numeric.

补救方法是使用.astype或将其转换为数字pd.to_numeric

df.Col = df.Col.astype(float)

Or,

或者,

df.Col = pd.to_numeric(df.Col, errors='coerce')
df.sort_values('Col')

   Col
0    1
1    2
2    3
3   10
5   19
4   20

The only difference b/w astypeand pd.to_numericis that the latter is more robust at handling non-numeric strings (they're coerced to NaN), and will attempt to preserve integers if a coercion to float is not necessary (as is seen in this case).

唯一的区别的B / W astype,并pd.to_numeric为后者在处理非数字字符串(他们被迫以更强大的NaN),并会尝试保留整数,如果强迫浮动是没有必要的(因为在这种情况下看到的) .