数字排序包含数字和字符串的列(pandas/python)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33314175/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
numerical sort a column containing numbers and strings (pandas/python)
提问by tafelplankje
I have to sort a data frame on column 1 and 2; column 1 contains numbers and text, which should first be numerically sorted. In excel this is the standard way to sort, but not in pandas.. I couldn't find much info on how to do this in the pandas manual..
我必须对第 1 列和第 2 列的数据框进行排序;第 1 列包含数字和文本,应首先按数字排序。在excel中,这是标准的排序方式,但在pandas中不是。我在pandas手册中找不到关于如何执行此操作的太多信息。
So this dataframe:
所以这个数据框:
Z 762320 296 1
Z 861349 297 0
1 865545 20 20
1 865584 297 0
22 865625 297 0
2 865628 292 5
10 865662 297 0
1 865665 296 0
11 865694 293 1
1 865700 297 0
10 866429 297 0
11 866438 297 0
should be:
应该:
1 865545 20 20
1 865584 297 0
1 865665 296 0
1 865700 297 0
2 865628 292 5
10 865662 297 0
10 866429 297 0
11 865694 293 1
11 866438 297 0
22 865625 297 0
Z 762320 296 1
Z 861349 297 0
when i do df.sort([0,1]) i get:
当我做 df.sort([0,1]) 我得到:
0 1 2 3
1 1 865545 20 20
2 1 865584 297 0
3 1 865665 296 0
4 1 865700 297 0
6 10 865662 297 0
7 10 866429 297 0
8 11 865694 293 1
9 11 866438 297 0
5 2 865628 292 5
10 22 865625 297 0
0 Z 762320 296 1
11 Z 861349 297 0
回答by Paulo Scardine
Do you mean column 0 and 1?
你的意思是第0列和第1列?
>>> df.sort([0, 1])
0 1 2 3
2 1 865545 20 20
3 1 865584 297 0
7 1 865665 296 0
9 1 865700 297 0
5 2 865628 292 5
6 10 865662 297 0
10 10 866429 297 0
8 11 865694 293 1
11 11 866438 297 0
4 22 865625 297 0
0 Z 762320 296 1
1 Z 861349 297 0
[update]
[更新]
This happens if your data is not numeric (all elements are strings).
如果您的数据不是数字(所有元素都是字符串),就会发生这种情况。
>>> df.values
array([['Z', '762320', '296', '1'],
['Z', '861349', '297', '0'],
['1', '865545', '20', '20'],
['1', '865584', '297', '0'],
['22', '865625', '297', '0'],
['2', '865628', '292', '5'],
['10', '865662', '297', '0'],
['1', '865665', '296', '0'],
['11', '865694', '293', '1'],
['1', '865700', '297', '0'],
['10', '866429', '297', '0'],
['11', '866438', '297', '0']], dtype=object)
String ordering is the expected result:
字符串排序是预期的结果:
>>> df.sort([0, 1])
0 1 2 3
2 1 865545 20 20
3 1 865584 297 0
7 1 865665 296 0
9 1 865700 297 0
6 10 865662 297 0
10 10 866429 297 0
8 11 865694 293 1
11 11 866438 297 0
5 2 865628 292 5
4 22 865625 297 0
0 Z 762320 296 1
1 Z 861349 297 0
Try to convert the values first:
尝试先转换值:
>>> def convert(v):
...: try:
...: return int(v)
...: except ValueError:
...: return v
>>> pandas.DataFrame([convert(c) for c in l] for l in df.values)\
.sort([0, 1])
0 1 2 3
2 1 865545 20 20
3 1 865584 297 0
7 1 865665 296 0
9 1 865700 297 0
5 2 865628 292 5
6 10 865662 297 0
10 10 866429 297 0
8 11 865694 293 1
11 11 866438 297 0
4 22 865625 297 0
0 Z 762320 296 1
1 Z 861349 297 0
What is the difference? The elements are numeric now:
有什么不同?元素现在是数字:
>>> pandas.DataFrame([convert(c) for c in l] for l in df.values)\
.sort([0, 1]).values
array([[1.0, 865545.0, 20.0, 20.0],
[1.0, 865584.0, 297.0, 0.0],
[1.0, 865665.0, 296.0, 0.0],
[1.0, 865700.0, 297.0, 0.0],
[2.0, 865628.0, 292.0, 5.0],
[10.0, 865662.0, 297.0, 0.0],
[10.0, 866429.0, 297.0, 0.0],
[11.0, 865694.0, 293.0, 1.0],
[11.0, 866438.0, 297.0, 0.0],
[22.0, 865625.0, 297.0, 0.0],
['Z', 762320.0, 296.0, 1.0],
['Z', 861349.0, 297.0, 0.0]], dtype=object)

