数字排序包含数字和字符串的列（pandas/python）

Question

提问by tafelplankje

I have to sort a data frame on column 1 and 2; column 1 contains numbers and text, which should first be numerically sorted. In excel this is the standard way to sort, but not in pandas.. I couldn't find much info on how to do this in the pandas manual..

我必须对第 1 列和第 2 列的数据框进行排序；第 1 列包含数字和文本，应首先按数字排序。在excel中，这是标准的排序方式，但在pandas中不是。我在pandas手册中找不到关于如何执行此操作的太多信息。

So this dataframe:

所以这个数据框：

Z   762320  296 1
Z   861349  297 0
1   865545  20  20
1   865584  297 0
22  865625  297 0
2   865628  292 5
10  865662  297 0
1   865665  296 0
11  865694  293 1
1   865700  297 0
10  866429  297 0
11  866438  297 0

should be:

应该：

1   865545  20  20
1   865584  297 0
1   865665  296 0
1   865700  297 0
2   865628  292 5
10  865662  297 0
10  866429  297 0
11  865694  293 1
11  866438  297 0
22  865625  297 0
Z   762320  296 1
Z   861349  297 0

when i do df.sort([0,1]) i get:

当我做 df.sort([0,1]) 我得到：

     0       1    2   3
1    1  865545   20  20
2    1  865584  297   0
3    1  865665  296   0
4    1  865700  297   0
6   10  865662  297   0
7   10  866429  297   0
8   11  865694  293   1
9   11  866438  297   0
5    2  865628  292   5
10  22  865625  297   0
0    Z  762320  296   1
11   Z  861349  297   0

Answer 1

回答by Paulo Scardine

Do you mean column 0 and 1?

你的意思是第0列和第1列？

>>> df.sort([0, 1])
     0       1    2   3
2    1  865545   20  20
3    1  865584  297   0
7    1  865665  296   0
9    1  865700  297   0
5    2  865628  292   5
6   10  865662  297   0
10  10  866429  297   0
8   11  865694  293   1
11  11  866438  297   0
4   22  865625  297   0 
0    Z  762320  296   1
1    Z  861349  297   0

[update]

[更新]

This happens if your data is not numeric (all elements are strings).

如果您的数据不是数字（所有元素都是字符串），就会发生这种情况。

>>> df.values
array([['Z', '762320', '296', '1'],
       ['Z', '861349', '297', '0'],
       ['1', '865545', '20', '20'],
       ['1', '865584', '297', '0'],
       ['22', '865625', '297', '0'],
       ['2', '865628', '292', '5'],
       ['10', '865662', '297', '0'],
       ['1', '865665', '296', '0'],
       ['11', '865694', '293', '1'],
       ['1', '865700', '297', '0'],
       ['10', '866429', '297', '0'],
       ['11', '866438', '297', '0']], dtype=object)

String ordering is the expected result:

字符串排序是预期的结果：

>>> df.sort([0, 1])    
     0       1    2   3
2    1  865545   20  20
3    1  865584  297   0
7    1  865665  296   0
9    1  865700  297   0
6   10  865662  297   0
10  10  866429  297   0
8   11  865694  293   1
11  11  866438  297   0
5    2  865628  292   5
4   22  865625  297   0
0    Z  762320  296   1
1    Z  861349  297   0

Try to convert the values first:

尝试先转换值：

>>> def convert(v):
...:    try:
...:        return int(v)    
...:    except ValueError:
...:        return v

>>> pandas.DataFrame([convert(c) for c in l] for l in df.values)\
      .sort([0, 1])

     0       1    2   3
2    1  865545   20  20
3    1  865584  297   0
7    1  865665  296   0
9    1  865700  297   0
5    2  865628  292   5
6   10  865662  297   0
10  10  866429  297   0
8   11  865694  293   1
11  11  866438  297   0
4   22  865625  297   0
0    Z  762320  296   1
1    Z  861349  297   0

What is the difference? The elements are numeric now:

有什么不同？元素现在是数字：

>>> pandas.DataFrame([convert(c) for c in l] for l in df.values)\
      .sort([0, 1]).values

array([[1.0, 865545.0, 20.0, 20.0],
      [1.0, 865584.0, 297.0, 0.0],
      [1.0, 865665.0, 296.0, 0.0],
      [1.0, 865700.0, 297.0, 0.0],
      [2.0, 865628.0, 292.0, 5.0],
      [10.0, 865662.0, 297.0, 0.0],
      [10.0, 866429.0, 297.0, 0.0],
      [11.0, 865694.0, 293.0, 1.0],
      [11.0, 866438.0, 297.0, 0.0],
      [22.0, 865625.0, 297.0, 0.0],
      ['Z', 762320.0, 296.0, 1.0],
      ['Z', 861349.0, 297.0, 0.0]], dtype=object)

数字排序包含数字和字符串的列（pandas/python）

提问by tafelplankje

回答by Paulo Scardine

相关推荐

最近更新

标签

数字排序包含数字和字符串的列（pandas/python）

提问by tafelplankje

回答by Paulo Scardine

相关推荐

pandas 将 float 系列中的所有元素转换为整数

pandas v0.17.0: AttributeError: 'unicode' 对象没有属性 'version'

pandas 将csv导入pandas数据帧时不读取所有行

pandas 类型错误：x 的预期一维向量

相关推荐

最近更新

标签