Python 使用 lambda 来应用 pd.DataFrame 而不是嵌套循环有可能吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19178762/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:03:45  来源:igfitidea点击:

Python using lambda to apply pd.DataFrame instead for nested loop is it possible?

pythonperformancenestedpandas

提问by JPC

I'm trying to avoid nested loop in python here by using lambda apply to create a new column using this argument below :

我试图通过使用 lambda apply 使用下面的这个参数创建一个新列来避免 python 中的嵌套循环:

from pandas import *
import pandas as pd    
df = pd.DataFrame((np.random.rand(100, 4)*100), columns=list('ABCD'))
df['C'] = df.apply(lambda A,B: A+B)

TypeError: ('() takes exactly 2 arguments (1 given)', u'occurred at index A')

类型错误: ('() 正好需要 2 个参数 (1 给定)', u'occurred at index A')

Obviously this doesn't work any recommendation ?

显然这没有任何建议?

采纳答案by miku

Do you want to add column Aand column Band store the result in C? Then you can have it simpler:

是否要添加列A和列B并将结果存储在C? 然后你可以让它更简单:

df.C = df.A + df.B


As @EdChum points out in the comment, the argument to the function in applyis a series, by default on axis 0which are rows (axis 1means columns):

正如@EdChum 在评论中指出的那样,函数 in 的参数apply是一个系列,默认情况下在轴上0是行(轴1表示列):

>>> df.apply(lambda s: s)[:3]
           A          B          C          D
0  57.890858  72.344298  16.348960  84.109071
1  85.534617  53.067682  95.212719  36.677814
2  23.202907   3.788458  66.717430   1.466331

Here, we add the first and the second row:

在这里,我们添加第一行和第二行:

>>> df.apply(lambda s: s[0] + s[1])
A    143.425475
B    125.411981
C    111.561680
D    120.786886
dtype: float64

To work on columns, use axis=1keyword parameter:

要处理列,请使用axis=1关键字参数:

>>> df.apply(lambda s: s[0] + s[1], axis=1)
0     130.235156
1     138.602299
2      26.991364
3     143.229523
...
98    152.640811
99     90.266934

Which yield the same result as referring to the columns by name:

产生与按名称引用列相同的结果:

>>> (df.apply(lambda s: s[0] + s[1], axis=1) == 
     df.apply(lambda s: s['A'] + s['B'], axis=1))
0     True
1     True
2     True
3     True
...
98    True
99    True