pandas 为什么pandas apply计算两次
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/21635915/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why does pandas apply calculate twice
提问by piRSquared
I'm using the apply method on a panda's DataFrame object. When my DataFrame has a single column, it appears that the applied function is being called twice. The questions are why? And, can I stop that behavior?
我在Pandas的 DataFrame 对象上使用 apply 方法。当我的 DataFrame 只有一列时,应用的函数似乎被调用了两次。问题是为什么?而且,我可以停止这种行为吗?
Code:
代码:
import pandas as pd
def mul2(x):
    print 'hello'
    return 2*x
df = pd.DataFrame({'a': [1,2,0.67,1.34]})
print df.apply(mul2)
Output:
输出:
hello
hello
0  2.00
1  4.00
2  1.34
3  2.68
I'm printing 'hello' from within the function being applied. I know it's being applied twice because 'hello' printed twice. What's more is that if I had two columns, 'hello' prints 3 times. Even more still is when I call applied to just the column 'hello' prints 4 times.
我正在应用的函数中打印“hello”。我知道它被应用了两次,因为 'hello' 打印了两次。更重要的是,如果我有两列,“你好”会打印 3 次。更重要的是,当我调用仅应用于列 'hello' 打印 4 次时。
Code:
代码:
print df.a.apply(mul2)
Output:
输出:
hello
hello
hello
hello
0    2.00
1    4.00
2    1.34
3    2.68
Name: a, dtype: float64
采纳答案by BrenBarn
Probably related to this issue. With groupby, the applied function is called one extra time to see if certain optimizations can be done. I'd guess something similar is going on here. It doesn't look like there's any way around it at the moment (although I could be wrong about the source of the behavior you're seeing). Is there a reason you need it to not do that extra call.
大概和这个问题有关。使用 groupby,应用函数会被额外调用一次,以查看是否可以完成某些优化。我猜这里正在发生类似的事情。目前看起来没有任何解决办法(尽管我可能对您所看到的行为的来源有误)。你有什么理由需要它不做那个额外的电话。
Also, calling it four times when you apply on the column is normal.  When you get one columnm you get a Series, not a DataFrame.  applyon a Series applies the function to each element.  Since your column has four elements in it, the function is called four times.
另外,在申请专栏时调用四次也是正常的。当您获得一个 columnm 时,您将获得一个系列,而不是一个 DataFrame。  apply在 Series 上将函数应用于每个元素。由于您的列中有四个元素,因此该函数被调用了四次。
回答by MERose
This behavior is intended, as an optimization.
此行为旨在作为优化。
See the docs:
查看文档:
In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.
在当前的实现中,在第一列/行上应用调用 func 两次来决定它是否可以采用快速或慢速的代码路径。如果 func 有副作用,这可能会导致意外行为,因为它们将对第一列/行生效两次。

