pandas 熊猫排序 lambda 函数

Question

提问by Tom B

Given a dataframe awith 3 columns, A, B, Cand 3 rows of numerical values. How does one sort all the rows with a comp operator using only the product of A[i]*B[i]. It seems that the pandas sort only takes columns and then a sort method.
I would like to use a comparison function like below.

给定一个数据帧a有3列，A，B，C和3行的数值。如何使用 comp 运算符仅使用A[i]*B[i]. 似乎Pandas排序只需要列，然后是排序方法。
我想使用如下的比较函数。

f = lambda i,j: a['A'][i]*a['B'][i] < a['A'][j]*a['B'][j]

Answer 1

回答by Ami Tavory

There are at least two ways:

至少有两种方式：

Method 1

方法一

Say you start with

说你开始

In [175]: df = pd.DataFrame({'A': [1, 2], 'B': [1, -1], 'C': [1, 1]})

You can add a column which is your sort key

您可以添加一列作为您的排序键

In [176]: df['sort_val'] = df.A * df.B

Finally sort by it and drop it

最后按它排序并删除它

In [190]: df.sort_values('sort_val').drop('sort_val', 1)
Out[190]: 
   A  B  C
1  2 -1  1
0  1  1  1

Method 2

方法二

Use numpy.argsortand then use .ixon the resulting indices:

使用numpy.argsort然后.ix在结果索引上使用：

In [197]: import numpy as np

In [198]: df.ix[np.argsort(df.A * df.B).values]
Out[198]: 
   A  B  C
0  1  1  1
1  2 -1  1

Answer 2

回答by srs

Another way, adding it here because this is the first result at Google:

另一种方式，在这里添加它，因为这是谷歌的第一个结果：

df.loc[(df.A * df.B).sort_values().index]

This works well for me and is pretty straightforward. @Ami Tavory's answer gave strange results for me with a categorical index; not sure it's because of that though.

这对我来说效果很好，而且非常简单。@Ami Tavory 的回答为我提供了一个带有分类索引的奇怪结果；不确定是不是因为这个。

Answer 3

回答by mork

Just adding on @srs super elegantanswer an ilocoption with some time comparisons with locand the naive solution.

只需在@srs 上添加一个超级优雅的答案iloc选项，并loc与简单的解决方案进行一些时间比较。

(iloc is preferred for when your your index is position-based (vs label-based for loc)

（当您的索引基于位置时（与基于标签的 loc）相比，iloc 是首选

import numpy as np
import pandas as pd

N = 10000
df = pd.DataFrame({
                   'A': np.random.randint(low=1, high=N, size=N), 
                   'B': np.random.randint(low=1, high=N, size=N)
                  })

%%timeit -n 100
df['C'] = df['A'] * df['B']
df.sort_values(by='C')

naive: 100 loops, best of 3: 1.85 ms per loop

天真：100 个循环，最好的 3 个：每个循环 1.85 毫秒

%%timeit -n 100
df.loc[(df.A * df.B).sort_values().index]

loc: 100 loops, best of 3: 2.69 ms per loop

loc：100 个循环，最好的 3 个：每个循环 2.69 毫秒

%%timeit -n 100
df.iloc[(df.A * df.B).sort_values().index]

iloc: 100 loops, best of 3: 2.02 ms per loop

iloc：100 个循环，最好的 3 个：每个循环 2.02 毫秒

df['C'] = df['A'] * df['B']

df1 = df.sort_values(by='C')
df2 = df.loc[(df.A * df.B).sort_values().index]
df3 = df.iloc[(df.A * df.B).sort_values().index]

print np.array_equal(df1.index, df2.index)
print np.array_equal(df2.index, df3.index)

testing results (comparing the entire index order) between all options:

所有选项之间的测试结果（比较整个索引顺序）：

True

真的

True

真的

pandas 熊猫排序 lambda 函数

提问by Tom B

回答by Ami Tavory

回答by srs

回答by mork

相关推荐

最近更新

标签

pandas 熊猫排序 lambda 函数

提问by Tom B

回答by Ami Tavory

回答by srs

回答by mork

相关推荐

python pandas.Series.str.contains整个词

pandas 如何在 Python 数据帧中分块读取数据？

pandas 大熊猫可以读取转置的 CSV 文件吗？

pandas 如何在不连接的情况下读取 Python 数据帧中的数据？

相关推荐

最近更新

标签