pandas 对熊猫数据框中的每一行进行排序的最快方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25817930/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fastest way to sort each row in a pandas dataframe
提问by Luke
I need to find the quickest way to sort each row in a dataframe with millions of rows and around a hundred columns.
我需要找到对包含数百万行和大约一百列的数据框中的每一行进行排序的最快方法。
So something like this:
所以像这样:
A B C D
3 4 8 1
9 2 7 2
Needs to become:
需要变成:
A B C D
8 4 3 1
9 7 2 2
Right now I'm applying sort to each row and building up a new dataframe row by row. I'm also doing a couple of extra, less important things to each row (hence why I'm using pandas and not numpy). Could it be quicker to instead create a list of lists and then build the new dataframe at once? Or do I need to go cython?
现在我正在对每一行应用排序并逐行构建一个新的数据框。我还对每一行做一些额外的、不太重要的事情(因此我使用 Pandas 而不是 numpy)。是否可以更快地创建一个列表列表,然后立即构建新的数据框?还是我需要去cython?
回答by Andy Hayden
I think I would do this in numpy:
我想我会在 numpy 中做到这一点:
In [11]: a = df.values
In [12]: a.sort(axis=1) # no ascending argument
In [13]: a = a[:, ::-1] # so reverse
In [14]: a
Out[14]:
array([[8, 4, 3, 1],
[9, 7, 2, 2]])
In [15]: pd.DataFrame(a, df.index, df.columns)
Out[15]:
A B C D
0 8 4 3 1
1 9 7 2 2
I had thought this might work, but it sorts the columns:
我原以为这可能有效,但它对列进行了排序:
In [21]: df.sort(axis=1, ascending=False)
Out[21]:
D C B A
0 1 8 4 3
1 2 7 2 9
Ah, pandas raises:
啊,Pandas提出:
In [22]: df.sort(df.columns, axis=1, ascending=False)
ValueError: When sorting by column, axis must be 0 (rows)
ValueError:按列排序时,轴必须为 0(行)
回答by SpmP
To Add to the answer given by @Andy-Hayden, to do this inplace to the whole frame... not really sure why this works, but it does. There seems to be no control on the order.
添加到@Andy-Hayden 给出的答案中,对整个框架执行此操作......不太确定为什么会这样,但确实如此。似乎没有控制顺序。
In [97]: A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
In [98]: A
Out[98]:
one two three four five
0 22 63 72 46 49
1 43 30 69 33 25
2 93 24 21 56 39
3 3 57 52 11 74
In [99]: A.values.sort
Out[99]: <function ndarray.sort>
In [100]: A
Out[100]:
one two three four five
0 22 63 72 46 49
1 43 30 69 33 25
2 93 24 21 56 39
3 3 57 52 11 74
In [101]: A.values.sort()
In [102]: A
Out[102]:
one two three four five
0 22 46 49 63 72
1 25 30 33 43 69
2 21 24 39 56 93
3 3 11 52 57 74
In [103]: A = A.iloc[:,::-1]
In [104]: A
Out[104]:
five four three two one
0 72 63 49 46 22
1 69 43 33 30 25
2 93 56 39 24 21
3 74 57 52 11 3
I hope someone can explain the why of this, just happy that it works 8)
我希望有人能解释为什么会这样,很高兴它有效 8)
回答by Pradeep Vairamani
You could use pd.apply.
你可以使用 pd.apply。
Eg:
A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
print (A)
one two three four five
0 2 75 44 53 46
1 18 51 73 80 66
2 35 91 86 44 25
3 60 97 57 33 79
A = A.apply(np.sort, axis = 1)
print(A)
one two three four five
0 2 44 46 53 75
1 18 51 66 73 80
2 25 35 44 86 91
3 33 57 60 79 97
Since you want it in descending order, you can simply multiply the dataframe with -1 and sort it.
由于您希望按降序排列,您可以简单地将数据帧与 -1 相乘并对其进行排序。
A = pd.DataFrame(np.random.randint(0,100,(4,5)), columns=['one','two','three','four','five'])
A = A * -1
A = A.apply(np.sort, axis = 1)
A = A * -1
回答by Erfan
Instead of using pd.DataFrameconstructor, an easier way to assign the sorted values back is to use double brackets:
代替使用pd.DataFrame构造函数,将排序后的值分配回的更简单方法是使用双括号:
original dataframe:
原始数据框:
A B C D
3 4 8 1
9 2 7 2
df[['A', 'B', 'C', 'D']] = np.sort(df)[:, ::-1]
A B C D
0 8 4 3 1
1 9 7 2 2
This way you can also sort a part of the columns:
通过这种方式,您还可以对部分列进行排序:
df[['B', 'C']] = np.sort(df[['B', 'C']])[:, ::-1]
A B C D
0 3 8 4 1
1 9 7 2 2

