pandas 在带有 lambda 函数的数据框中使用 if 语句

Question

提问by IcemanBerlin

I am trying to add a new column to a dataframe based on an if statement depending on the values of two columns. i.e. if column x == None then column y else column x

我正在尝试根据两列的值基于 if 语句向数据框添加新列。即如果列 x == None 那么列 y else 列 x

below is the script I have written but doesn't work. any ideas?

下面是我写的脚本，但不起作用。有任何想法吗？

dfCurrentReportResults['Retention'] =  dfCurrentReportResults.apply(lambda x : x.Retention_y if x.Retention_x == None else x.Retention_x)

Also I got this error message: AttributeError: ("'Series' object has no attribute 'Retention_x'", u'occurred at index BUSINESSUNIT_NAME')

我还收到此错误消息：AttributeError: ("'Series' object has no attribute 'Retention_x'", u'occurred at index BUSINESSUNIT_NAME')

fyi: BUSINESSUNIT_NAME is the first column name

仅供参考：BUSINESSUNIT_NAME 是第一个列名

Additional Info:

附加信息：

My data printed out looks like this and I want to add a 3rd column to take a value if there is one else keep NaN.

我打印出来的数据看起来像这样，如果还有其他人保留 NaN，我想添加第三列来取值。

   Retention_x  Retention_y
0            1          NaN
1          NaN     0.672183
2          NaN     1.035613
3          NaN     0.771469
4          NaN     0.916667
5          NaN          NaN
6          NaN          NaN
7          NaN          NaN
8          NaN          NaN
9          NaN          NaN

UPDATE:In the end I was having issues referencing the Null or is Null in my dataframe the final line of code I used also including the axis = 1 answered my question.

更新：最后，我在引用 Null 或数据帧中的 Null 时遇到了问题，我使用的最后一行代码也包括轴 = 1 回答了我的问题。

 dfCurrentReportResults['RetentionLambda'] = dfCurrentReportResults.apply(lambda x : x['Retention_y'] if pd.isnull(x['Retention_x']) else x['Retention_x'], axis = 1)

Thanks @EdChum, @strim099 and @aus_lacy for all your input. As my data set gets larger I may switch to the np.where option if I notice performance issues.

感谢@EdChum、@strim099 和@aus_lacy 提供的所有意见。随着我的数据集变大，如果我注意到性能问题，我可能会切换到 np.where 选项。

Answer 1

回答by Jason Strimpel

You'r lambda is operating on the 0 axis which is columnwise. Simply add axis=1to the applyarg list. This is clearly documented.

您的 lambda 正在按列的 0 轴上运行。只需添加axis=1到applyarg 列表。这是有明确记录的。

In [1]: import pandas

In [2]: dfCurrentReportResults = pandas.DataFrame([['a','b'],['c','d'],['e','f'],['g','h'],['i','j']], columns=['Retention_y', 'Retention_x'])

In [3]: dfCurrentReportResults['Retention_x'][1] = None

In [4]: dfCurrentReportResults['Retention_x'][3] = None

In [5]: dfCurrentReportResults
Out[5]:
  Retention_y Retention_x
0           a           b
1           c        None
2           e           f
3           g        None
4           i           j

In [6]: dfCurrentReportResults['Retention'] =  dfCurrentReportResults.apply(lambda x : x.Retention_y if x.Retention_x == None else x.Retention_x, axis=1)

In [7]: dfCurrentReportResults
Out[7]:
  Retention_y Retention_x Retention
0           a           b         b
1           c        None         c
2           e           f         f
3           g        None         g
4           i           j         j

Answer 2

回答by EdChum

Just use np.where:

只需使用np.where：

dfCurrentReportResults['Retention'] =  np.where(df.Retention_x == None, df.Retention_y, else df.Retention_x)

This uses the test condition, the first param and sets the value to df.Retention_yelse df.Retention_x

这使用测试条件，第一个参数并将值设置为df.Retention_yelsedf.Retention_x

also avoid using applywhere possible as this is just going to loop over the values, np.whereis a vectorised method and will scale much better.

也尽量避免使用apply，因为这只会循环遍历值，np.where是一种矢量化方法，可以更好地扩展。

UPDATE

更新

OK no need to use np.wherejust use the following simpler syntax:

OK 无需使用，np.where只需使用以下更简单的语法：

dfCurrentReportResults['Retention'] =  df.Retention_y.where(df.Retention_x == None, df.Retention_x)

Further update

进一步更新

dfCurrentReportResults['Retention'] =  df.Retention_y.where(df.Retention_x.isnull(), df.Retention_x)

pandas 在带有 lambda 函数的数据框中使用 if 语句

提问by IcemanBerlin

回答by Jason Strimpel

回答by EdChum

相关推荐

最近更新

标签

pandas 在带有 lambda 函数的数据框中使用 if 语句

提问by IcemanBerlin

回答by Jason Strimpel

回答by EdChum

相关推荐

如何将两个 Pandas Dataframe 列堆叠在一起？

pandas 将熊猫日期时间索引向前一天设置

使用 read_csv 将财务数据导入 Python Pandas

pandas matplotlib 的 plt.acorr 中自相关图的错误？

相关推荐

最近更新

标签