pandas 在带有 lambda 函数的数据框中使用 if 语句

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27845145/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:49:19  来源:igfitidea点击:

Using an if statement in a dataframe with lambda functions

pythonpandaslambdaconditional-statements

提问by IcemanBerlin

I am trying to add a new column to a dataframe based on an if statement depending on the values of two columns. i.e. if column x == None then column y else column x

我正在尝试根据两列的值基于 if 语句向数据框添加新列。即如果列 x == None 那么列 y else 列 x

below is the script I have written but doesn't work. any ideas?

下面是我写的脚本,但不起作用。有任何想法吗?

dfCurrentReportResults['Retention'] =  dfCurrentReportResults.apply(lambda x : x.Retention_y if x.Retention_x == None else x.Retention_x)

Also I got this error message: AttributeError: ("'Series' object has no attribute 'Retention_x'", u'occurred at index BUSINESSUNIT_NAME')

我还收到此错误消息:AttributeError: ("'Series' object has no attribute 'Retention_x'", u'occurred at index BUSINESSUNIT_NAME')

fyi: BUSINESSUNIT_NAME is the first column name

仅供参考:BUSINESSUNIT_NAME 是第一个列名

Additional Info:

附加信息:

My data printed out looks like this and I want to add a 3rd column to take a value if there is one else keep NaN.

我打印出来的数据看起来像这样,如果还有其他人保留 NaN,我想添加第三列来取值。

   Retention_x  Retention_y
0            1          NaN
1          NaN     0.672183
2          NaN     1.035613
3          NaN     0.771469
4          NaN     0.916667
5          NaN          NaN
6          NaN          NaN
7          NaN          NaN
8          NaN          NaN
9          NaN          NaN

UPDATE:In the end I was having issues referencing the Null or is Null in my dataframe the final line of code I used also including the axis = 1 answered my question.

更新:最后,我在引用 Null 或数据帧中的 Null 时遇到了问题,我使用的最后一行代码也包括轴 = 1 回答了我的问题。

 dfCurrentReportResults['RetentionLambda'] = dfCurrentReportResults.apply(lambda x : x['Retention_y'] if pd.isnull(x['Retention_x']) else x['Retention_x'], axis = 1)

Thanks @EdChum, @strim099 and @aus_lacy for all your input. As my data set gets larger I may switch to the np.where option if I notice performance issues.

感谢@EdChum、@strim099 和@aus_lacy 提供的所有意见。随着我的数据集变大,如果我注意到性能问题,我可能会切换到 np.where 选项。

回答by Jason Strimpel

You'r lambda is operating on the 0 axis which is columnwise. Simply add axis=1to the applyarg list. This is clearly documented.

您的 lambda 正在按列的 0 轴上运行。只需添加axis=1applyarg 列表。这是有明确记录的。

In [1]: import pandas

In [2]: dfCurrentReportResults = pandas.DataFrame([['a','b'],['c','d'],['e','f'],['g','h'],['i','j']], columns=['Retention_y', 'Retention_x'])

In [3]: dfCurrentReportResults['Retention_x'][1] = None

In [4]: dfCurrentReportResults['Retention_x'][3] = None

In [5]: dfCurrentReportResults
Out[5]:
  Retention_y Retention_x
0           a           b
1           c        None
2           e           f
3           g        None
4           i           j

In [6]: dfCurrentReportResults['Retention'] =  dfCurrentReportResults.apply(lambda x : x.Retention_y if x.Retention_x == None else x.Retention_x, axis=1)

In [7]: dfCurrentReportResults
Out[7]:
  Retention_y Retention_x Retention
0           a           b         b
1           c        None         c
2           e           f         f
3           g        None         g
4           i           j         j

回答by EdChum

Just use np.where:

只需使用np.where

dfCurrentReportResults['Retention'] =  np.where(df.Retention_x == None, df.Retention_y, else df.Retention_x)

This uses the test condition, the first param and sets the value to df.Retention_yelse df.Retention_x

这使用测试条件,第一个参数并将值设置为df.Retention_yelsedf.Retention_x

also avoid using applywhere possible as this is just going to loop over the values, np.whereis a vectorised method and will scale much better.

也尽量避免使用apply,因为这只会循环遍历值,np.where是一种矢量化方法,可以更好地扩展。

UPDATE

更新

OK no need to use np.wherejust use the following simpler syntax:

OK 无需使用,np.where只需使用以下更简单的语法:

dfCurrentReportResults['Retention'] =  df.Retention_y.where(df.Retention_x == None, df.Retention_x)

Further update

进一步更新

dfCurrentReportResults['Retention'] =  df.Retention_y.where(df.Retention_x.isnull(), df.Retention_x)