对 Pandas 数据框中的每一行只运行一次函数

Question

提问by David Nehme

If I have a function

如果我有一个功能

def do_irreversible_thing(a, b):
    print a, b

And a dataframe, say

还有一个数据框，比如说

df = pd.DataFrame([(0, 1), (2, 3), (4, 5)], columns=['a', 'b'])

What's the best way to run the function exactly oncefor each row in a pandas dataframe. Aspointed out in other questions, something like df.apply pandas will call the function twice for the first row. Even using numpy

为 Pandas 数据框中的每一行只运行一次函数的最佳方法是什么。正如其他问题中所指出的， df.apply pandas 之类的东西会在第一行调用该函数两次。即使使用 numpy

np.vectorize(do_irreversible_thing)(df.a, df.b)

causes the function to be called twice on the first row, as will df.T.apply()or df.apply(..., axis=1).

导致函数在第一行被调用两次，就像 willdf.T.apply()或 df.apply(...,axis=1)。

Is there a faster or cleaner way to call the function with every row than this explicit loop?

有没有比这个显式循环更快或更干净的方法来为每一行调用函数？

   for idx, a, b in df.itertuples():
       do_irreversible_thing(a, b)

Answer 1

回答by Rosa Alejandra

The way I do it (because I also don't like the idea of looping with df.itertuples) is:

我这样做的方式（因为我也不喜欢用 df.itertuples 循环的想法）是：

df.apply(do_irreversible_thing, axis=1)

and then your function should be like:

然后你的功能应该是这样的：

def do_irreversible_thing(x):
    print x.a, x.b

this way you should be able to run your function over each row.

这样你应该能够在每一行上运行你的函数。

OR

或者

if you can't modify your function you could applyit like this

如果你不能修改你的函数，你可以apply这样

df.apply(lambda x: do_irreversible_thing(x[0],x[1]), axis=1)

Answer 2

回答by EdChum

It's unclear what your function is doing but to applya function to each row you can do so by passing axis=1to applyyour function row-wise and pass the column elements of interest:

目前尚不清楚您的函数在做什么，但是对于apply每一行的函数，您可以通过axis=1按apply行传递给函数并传递感兴趣的列元素来实现：

In [155]:
def foo(a,b):
    return a*b
?
df = pd.DataFrame([(0, 1), (2, 3), (4, 5)], columns=['a', 'b'])
df.apply(lambda x: foo(x['a'], x['b']), axis=1)

Out[155]:
0     0
1     6
2    20
dtype: int64

However, so long as your function does not depend on the df mutating on each row, then you can just use a vectorised method to operate on the entire column:

但是，只要您的函数不依赖于每一行的 df 变异，那么您就可以使用矢量化方法对整列进行操作：

In [156]:
df['a'] * df['b']

Out[156]:
0     0
1     6
2    20
dtype: int64

The reason is that because the functions are vectorised then it will scale better whilst the applyis just syntactic sugar for iterating on your df so it's a forloop essentially

原因是因为函数是矢量化的，所以它会更好地扩展，而这apply只是用于迭代你的 df 的语法糖，所以它for本质上是一个循环

对 Pandas 数据框中的每一行只运行一次函数

提问by David Nehme

回答by Rosa Alejandra

回答by EdChum

相关推荐

最近更新

标签

对 Pandas 数据框中的每一行只运行一次函数

提问by David Nehme

回答by Rosa Alejandra

回答by EdChum

相关推荐

pandas 通过从熊猫数据框中的非缺失值中随机选择来填充缺失数据

pandas 有效地合并熊猫中的多个数据帧

我可以使用带有 Pandas 数据框的散点图绘制回归线并显示参数吗？

Pandas 在没有手动指定级别的情况下在多索引列上融化 (Python 3.5.1)

相关推荐

最近更新

标签