对 Pandas 数据框中的每一行只运行一次函数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36609457/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Run function exactly once for each row in a Pandas dataframe
提问by David Nehme
If I have a function
如果我有一个功能
def do_irreversible_thing(a, b):
print a, b
And a dataframe, say
还有一个数据框,比如说
df = pd.DataFrame([(0, 1), (2, 3), (4, 5)], columns=['a', 'b'])
What's the best way to run the function exactly oncefor each row in a pandas dataframe. Aspointed out in other questions, something like df.apply pandas will call the function twice for the first row. Even using numpy
为 Pandas 数据框中的每一行只运行一次函数的最佳方法是什么。 正如其他问题中所指出的, df.apply pandas 之类的东西会在第一行调用该函数两次。即使使用 numpy
np.vectorize(do_irreversible_thing)(df.a, df.b)
causes the function to be called twice on the first row, as will df.T.apply()or df.apply(..., axis=1).
导致函数在第一行被调用两次,就像 willdf.T.apply()或 df.apply(...,axis=1)。
Is there a faster or cleaner way to call the function with every row than this explicit loop?
有没有比这个显式循环更快或更干净的方法来为每一行调用函数?
for idx, a, b in df.itertuples():
do_irreversible_thing(a, b)
回答by Rosa Alejandra
The way I do it (because I also don't like the idea of looping with df.itertuples) is:
我这样做的方式(因为我也不喜欢用 df.itertuples 循环的想法)是:
df.apply(do_irreversible_thing, axis=1)
and then your function should be like:
然后你的功能应该是这样的:
def do_irreversible_thing(x):
print x.a, x.b
this way you should be able to run your function over each row.
这样你应该能够在每一行上运行你的函数。
OR
或者
if you can't modify your function you could applyit like this
如果你不能修改你的函数,你可以apply这样
df.apply(lambda x: do_irreversible_thing(x[0],x[1]), axis=1)
回答by EdChum
It's unclear what your function is doing but to applya function to each row you can do so by passing axis=1to applyyour function row-wise and pass the column elements of interest:
目前尚不清楚您的函数在做什么,但是对于apply每一行的函数,您可以通过axis=1按apply行传递给函数并传递感兴趣的列元素来实现:
In [155]:
def foo(a,b):
return a*b
?
df = pd.DataFrame([(0, 1), (2, 3), (4, 5)], columns=['a', 'b'])
df.apply(lambda x: foo(x['a'], x['b']), axis=1)
Out[155]:
0 0
1 6
2 20
dtype: int64
However, so long as your function does not depend on the df mutating on each row, then you can just use a vectorised method to operate on the entire column:
但是,只要您的函数不依赖于每一行的 df 变异,那么您就可以使用矢量化方法对整列进行操作:
In [156]:
df['a'] * df['b']
Out[156]:
0 0
1 6
2 20
dtype: int64
The reason is that because the functions are vectorised then it will scale better whilst the applyis just syntactic sugar for iterating on your df so it's a forloop essentially
原因是因为函数是矢量化的,所以它会更好地扩展,而这apply只是用于迭代你的 df 的语法糖,所以它for本质上是一个循环

