pandas 对熊猫数据框中的每一行进行排序的最快方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25817930/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:28:08  来源:igfitidea点击:

Fastest way to sort each row in a pandas dataframe

pythonperformancepandas

提问by Luke

I need to find the quickest way to sort each row in a dataframe with millions of rows and around a hundred columns.

我需要找到对包含数百万行和大约一百列的数据框中的每一行进行排序的最快方法。

So something like this:

所以像这样:

A   B   C   D
3   4   8   1
9   2   7   2

Needs to become:

需要变成:

A   B   C   D
8   4   3   1
9   7   2   2

Right now I'm applying sort to each row and building up a new dataframe row by row. I'm also doing a couple of extra, less important things to each row (hence why I'm using pandas and not numpy). Could it be quicker to instead create a list of lists and then build the new dataframe at once? Or do I need to go cython?

现在我正在对每一行应用排序并逐行构建一个新的数据框。我还对每一行做一些额外的、不太重要的事情(因此我使用 Pandas 而不是 numpy)。是否可以更快地创建一个列表列表,然后立即构建新的数据框?还是我需要去cython?

回答by Andy Hayden

I think I would do this in numpy:

我想我会在 numpy 中做到这一点:

In [11]: a = df.values

In [12]: a.sort(axis=1)  # no ascending argument

In [13]: a = a[:, ::-1]  # so reverse

In [14]: a
Out[14]:
array([[8, 4, 3, 1],
       [9, 7, 2, 2]])

In [15]: pd.DataFrame(a, df.index, df.columns)
Out[15]:
   A  B  C  D
0  8  4  3  1
1  9  7  2  2


I had thought this might work, but it sorts the columns:

我原以为这可能有效,但它对列进行了排序:

In [21]: df.sort(axis=1, ascending=False)
Out[21]:
   D  C  B  A
0  1  8  4  3
1  2  7  2  9

Ah, pandas raises:

啊,Pandas提出:

In [22]: df.sort(df.columns, axis=1, ascending=False)

ValueError: When sorting by column, axis must be 0 (rows)

ValueError:按列排序时,轴必须为 0(行)

回答by SpmP

To Add to the answer given by @Andy-Hayden, to do this inplace to the whole frame... not really sure why this works, but it does. There seems to be no control on the order.

添加到@Andy-Hayden 给出的答案中,对整个框架执行此操作......不太确定为什么会这样,但确实如此。似乎没有控制顺序。

    In [97]: A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])

    In [98]: A
    Out[98]: 
    one  two  three  four  five
    0   22   63     72    46    49
    1   43   30     69    33    25
    2   93   24     21    56    39
    3    3   57     52    11    74

    In [99]: A.values.sort
    Out[99]: <function ndarray.sort>

    In [100]: A
    Out[100]: 
    one  two  three  four  five
    0   22   63     72    46    49
    1   43   30     69    33    25
    2   93   24     21    56    39
    3    3   57     52    11    74

    In [101]: A.values.sort()

    In [102]: A
    Out[102]: 
    one  two  three  four  five
    0   22   46     49    63    72
    1   25   30     33    43    69
    2   21   24     39    56    93
    3    3   11     52    57    74
    In [103]: A = A.iloc[:,::-1]

    In [104]: A
    Out[104]: 
    five  four  three  two  one
    0    72    63     49   46   22
    1    69    43     33   30   25
    2    93    56     39   24   21
    3    74    57     52   11    3

I hope someone can explain the why of this, just happy that it works 8)

我希望有人能解释为什么会这样,很高兴它有效 8)

回答by Pradeep Vairamani

You could use pd.apply.

你可以使用 pd.apply。

Eg:

A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five']) 
print (A)

   one  two  three  four  five
0    2   75     44    53    46
1   18   51     73    80    66
2   35   91     86    44    25
3   60   97     57    33    79

A = A.apply(np.sort, axis = 1) 
print(A)

   one  two  three  four  five
0    2   44     46    53    75
1   18   51     66    73    80
2   25   35     44    86    91
3   33   57     60    79    97

Since you want it in descending order, you can simply multiply the dataframe with -1 and sort it.

由于您希望按降序排列,您可以简单地将数据帧与 -1 相乘并对其进行排序。

A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
A = A * -1
A = A.apply(np.sort, axis = 1)
A = A * -1

回答by Erfan

Instead of using pd.DataFrameconstructor, an easier way to assign the sorted values back is to use double brackets:

代替使用pd.DataFrame构造函数,将排序后的值分配回的更简单方法是使用双括号:

original dataframe:

原始数据框

A   B   C   D
3   4   8   1
9   2   7   2
df[['A', 'B', 'C', 'D']] = np.sort(df)[:, ::-1]

   A  B  C  D
0  8  4  3  1
1  9  7  2  2

This way you can also sort a part of the columns:

通过这种方式,您还可以对部分列进行排序:

df[['B', 'C']] = np.sort(df[['B', 'C']])[:, ::-1]

   A  B  C  D
0  3  8  4  1
1  9  7  2  2