pandas 对熊猫数据框中的每一行进行排序的最快方法

Question

提问by Luke

I need to find the quickest way to sort each row in a dataframe with millions of rows and around a hundred columns.

我需要找到对包含数百万行和大约一百列的数据框中的每一行进行排序的最快方法。

So something like this:

所以像这样：

A   B   C   D
3   4   8   1
9   2   7   2

Needs to become:

需要变成：

A   B   C   D
8   4   3   1
9   7   2   2

Right now I'm applying sort to each row and building up a new dataframe row by row. I'm also doing a couple of extra, less important things to each row (hence why I'm using pandas and not numpy). Could it be quicker to instead create a list of lists and then build the new dataframe at once? Or do I need to go cython?

现在我正在对每一行应用排序并逐行构建一个新的数据框。我还对每一行做一些额外的、不太重要的事情（因此我使用 Pandas 而不是 numpy）。是否可以更快地创建一个列表列表，然后立即构建新的数据框？还是我需要去cython？

Answer 1

回答by Andy Hayden

I think I would do this in numpy:

我想我会在 numpy 中做到这一点：

In [11]: a = df.values

In [12]: a.sort(axis=1)  # no ascending argument

In [13]: a = a[:, ::-1]  # so reverse

In [14]: a
Out[14]:
array([[8, 4, 3, 1],
       [9, 7, 2, 2]])

In [15]: pd.DataFrame(a, df.index, df.columns)
Out[15]:
   A  B  C  D
0  8  4  3  1
1  9  7  2  2

I had thought this might work, but it sorts the columns:

我原以为这可能有效，但它对列进行了排序：

In [21]: df.sort(axis=1, ascending=False)
Out[21]:
   D  C  B  A
0  1  8  4  3
1  2  7  2  9

Ah, pandas raises:

啊，Pandas提出：

In [22]: df.sort(df.columns, axis=1, ascending=False)

ValueError: When sorting by column, axis must be 0 (rows)

ValueError：按列排序时，轴必须为 0（行）

Answer 2

回答by SpmP

To Add to the answer given by @Andy-Hayden, to do this inplace to the whole frame... not really sure why this works, but it does. There seems to be no control on the order.

添加到@Andy-Hayden 给出的答案中，对整个框架执行此操作......不太确定为什么会这样，但确实如此。似乎没有控制顺序。

    In [97]: A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])

    In [98]: A
    Out[98]: 
    one  two  three  four  five
    0   22   63     72    46    49
    1   43   30     69    33    25
    2   93   24     21    56    39
    3    3   57     52    11    74

    In [99]: A.values.sort
    Out[99]: <function ndarray.sort>

    In [100]: A
    Out[100]: 
    one  two  three  four  five
    0   22   63     72    46    49
    1   43   30     69    33    25
    2   93   24     21    56    39
    3    3   57     52    11    74

    In [101]: A.values.sort()

    In [102]: A
    Out[102]: 
    one  two  three  four  five
    0   22   46     49    63    72
    1   25   30     33    43    69
    2   21   24     39    56    93
    3    3   11     52    57    74
    In [103]: A = A.iloc[:,::-1]

    In [104]: A
    Out[104]: 
    five  four  three  two  one
    0    72    63     49   46   22
    1    69    43     33   30   25
    2    93    56     39   24   21
    3    74    57     52   11    3

I hope someone can explain the why of this, just happy that it works 8)

我希望有人能解释为什么会这样，很高兴它有效 8)

Answer 3

回答by Pradeep Vairamani

You could use pd.apply.

你可以使用 pd.apply。

Eg:

A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five']) 
print (A)

   one  two  three  four  five
0    2   75     44    53    46
1   18   51     73    80    66
2   35   91     86    44    25
3   60   97     57    33    79

A = A.apply(np.sort, axis = 1) 
print(A)

   one  two  three  four  five
0    2   44     46    53    75
1   18   51     66    73    80
2   25   35     44    86    91
3   33   57     60    79    97

Since you want it in descending order, you can simply multiply the dataframe with -1 and sort it.

由于您希望按降序排列，您可以简单地将数据帧与 -1 相乘并对其进行排序。

A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
A = A * -1
A = A.apply(np.sort, axis = 1)
A = A * -1

Answer 4

回答by Erfan

Instead of using pd.DataFrameconstructor, an easier way to assign the sorted values back is to use double brackets:

代替使用pd.DataFrame构造函数，将排序后的值分配回的更简单方法是使用双括号：

original dataframe:

原始数据框：

A   B   C   D
3   4   8   1
9   2   7   2

df[['A', 'B', 'C', 'D']] = np.sort(df)[:, ::-1]

   A  B  C  D
0  8  4  3  1
1  9  7  2  2

This way you can also sort a part of the columns:

通过这种方式，您还可以对部分列进行排序：

df[['B', 'C']] = np.sort(df[['B', 'C']])[:, ::-1]

   A  B  C  D
0  3  8  4  1
1  9  7  2  2

pandas 对熊猫数据框中的每一行进行排序的最快方法

提问by Luke

回答by Andy Hayden

回答by SpmP

回答by Pradeep Vairamani

回答by Erfan

相关推荐

最近更新

标签

pandas 对熊猫数据框中的每一行进行排序的最快方法

提问by Luke

回答by Andy Hayden

回答by SpmP

回答by Pradeep Vairamani

回答by Erfan

相关推荐

pandas python pandas中的R dcast等价物

pandas 熊猫DF中的重复行

pandas 从两个列表中获取元素的所有组合？

pandas 将大量数据从远程服务器拉入 DataFrame

相关推荐

最近更新

标签