pandas 熊猫排序 lambda 函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39525928/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas sort lambda function
提问by Tom B
Given a dataframe a
with 3 columns, A
, B
, C
and 3 rows of numerical values. How does one sort all the rows with a comp operator using only the product of A[i]*B[i]
. It seems that the pandas sort only takes columns and then a sort method.
I would like to use a comparison function like below.
给定一个数据帧a
有3列,A
,B
,C
和3行的数值。如何使用 comp 运算符仅使用A[i]*B[i]
. 似乎Pandas排序只需要列,然后是排序方法。
我想使用如下的比较函数。
f = lambda i,j: a['A'][i]*a['B'][i] < a['A'][j]*a['B'][j]
回答by Ami Tavory
There are at least two ways:
至少有两种方式:
Method 1
方法一
Say you start with
说你开始
In [175]: df = pd.DataFrame({'A': [1, 2], 'B': [1, -1], 'C': [1, 1]})
You can add a column which is your sort key
您可以添加一列作为您的排序键
In [176]: df['sort_val'] = df.A * df.B
Finally sort by it and drop it
最后按它排序并删除它
In [190]: df.sort_values('sort_val').drop('sort_val', 1)
Out[190]:
A B C
1 2 -1 1
0 1 1 1
Method 2
方法二
Use numpy.argsort
and then use .ix
on the resulting indices:
使用numpy.argsort
然后.ix
在结果索引上使用:
In [197]: import numpy as np
In [198]: df.ix[np.argsort(df.A * df.B).values]
Out[198]:
A B C
0 1 1 1
1 2 -1 1
回答by srs
Another way, adding it here because this is the first result at Google:
另一种方式,在这里添加它,因为这是谷歌的第一个结果:
df.loc[(df.A * df.B).sort_values().index]
This works well for me and is pretty straightforward. @Ami Tavory's answer gave strange results for me with a categorical index; not sure it's because of that though.
这对我来说效果很好,而且非常简单。@Ami Tavory 的回答为我提供了一个带有分类索引的奇怪结果;不确定是不是因为这个。
回答by mork
Just adding on @srs super elegantanswer an iloc
option with some time comparisons with loc
and the naive solution.
只需在@srs 上添加一个超级优雅的答案iloc
选项,并loc
与简单的解决方案进行一些时间比较。
(iloc is preferred for when your your index is position-based (vs label-based for loc)
(当您的索引基于位置时(与基于标签的 loc)相比,iloc 是首选
import numpy as np
import pandas as pd
N = 10000
df = pd.DataFrame({
'A': np.random.randint(low=1, high=N, size=N),
'B': np.random.randint(low=1, high=N, size=N)
})
%%timeit -n 100
df['C'] = df['A'] * df['B']
df.sort_values(by='C')
naive: 100 loops, best of 3: 1.85 ms per loop
天真:100 个循环,最好的 3 个:每个循环 1.85 毫秒
%%timeit -n 100
df.loc[(df.A * df.B).sort_values().index]
loc: 100 loops, best of 3: 2.69 ms per loop
loc:100 个循环,最好的 3 个:每个循环 2.69 毫秒
%%timeit -n 100
df.iloc[(df.A * df.B).sort_values().index]
iloc: 100 loops, best of 3: 2.02 ms per loop
iloc:100 个循环,最好的 3 个:每个循环 2.02 毫秒
df['C'] = df['A'] * df['B']
df1 = df.sort_values(by='C')
df2 = df.loc[(df.A * df.B).sort_values().index]
df3 = df.iloc[(df.A * df.B).sort_values().index]
print np.array_equal(df1.index, df2.index)
print np.array_equal(df2.index, df3.index)
testing results (comparing the entire index order) between all options:
所有选项之间的测试结果(比较整个索引顺序):
True
真的
True
真的