使用 if-else 创建新列时的 Pandas 错误：Series 的真值不明确

Question

提问by stackoverflowuser2010

I'm using Pandas and am trying to create a new column using a Python if-else statement (aka ternary condition operator) in order to avoid division by zero.

我正在使用 Pandas 并尝试使用 Python if-else 语句（又名三元条件运算符）创建一个新列，以避免被零除。

For example below, I want to create a new column C by dividing A/B. I want to use the if-else statement to avoid dividing by 0.

例如下面，我想通过划分 A/B 创建一个新列 C。我想使用 if-else 语句来避免除以 0。

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0, 5, size=(100, 2)), columns=list('AB'))
df.head()
#    A  B
# 0  1  3
# 1  1  2
# 2  0  0
# 3  2  1
# 4  4  2

df['C'] = (df.A / df.B) if df.B > 0.0 else 0.0

However, I am getting an error from the last line:

但是，我从最后一行收到错误消息：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I searched on StackOverflow and found other posts about this error, but none of them involved this type of if-else statement. Some posts include:

我在 StackOverflow 上搜索并找到了有关此错误的其他帖子，但没有一个涉及这种类型的 if-else 语句。一些帖子包括：

Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()

The truth value of a Series is ambiguous in dataframe

系列的真值在数据框中不明确

Error: The truth value of a Series is ambiguous - Python pandas

错误：系列的真值不明确 - Python pandas

Any help would be appreciated.

任何帮助，将不胜感激。

Answer 1

回答by keepAlive

What about doing

怎么办

>>> df['C'] = np.where(df.B>0., df.A/df.B, 0.)

which reads as :

读作：

where df.Bis strictly positive, return df.A/df.B, otherwise return 0.

其中df.B严格为正，返回df.A/df.B，否则返回0.

Answer 2

回答by Alexander

df.B > 0results in a Series, e.g.:

df.B > 0结果是一个系列，例如：

0      True  # 4 > 0 => True
1      True  # 2 > 0 => True
2      True  # ...
3      True
4      True
5      True
6      True
7      True
8     False  # 0 is not > 0 => False
9     False  # 0 is not > 0 => False
...

Multiple values are returned which results in ambiguity (some are True while others are False).

返回的多个值会导致歧义（有些为 True，有些为 False）。

One solution is to use np.where:

一种解决方案是使用np.where：

sentinel = np.nan  # Or 0 if you must...
df = df.assign(C=np.where(df['B'] != 0, df['A'] / df['B'], sentinel))
>>> df
   A  B    C
0  2  4  0.5
1  0  2  0.0
2  1  2  0.5
3  4  4  1.0
4  1  1  1.0
5  4  4  1.0
6  2  4  0.5
7  1  2  0.5
8  4  0  NaN  # NaN is assigned in cases where the value in Column `B` is zero.
9  1  0  NaN
...

Answer 3

回答by YOBEN_S

df['C']=df.A.div(df.B.mask(df.B.lt(0),0)).fillna(0)
df
Out[89]: 
   A  B         C
0  1  3  0.333333
1  1  2  0.500000
2  0  0  0.000000
3  2  1  2.000000
4  4  2  2.000000

With apply lambda

使用 lambda

df['C']=df.apply(lambda x : x['A']/x['B'] if x['B']>0 else 0,1)
df
Out[93]: 
   A  B         C
0  1  3  0.333333
1  1  2  0.500000
2  0  0  0.000000
3  2  1  2.000000
4  4  2  2.000000

Answer 4

回答by HeyMan

Based on @vaishnav proposal above on iterating over the dataframe here is a working proposal:

基于上面关于迭代数据框的@vaishnav 提议，这里是一个工作提议：

for index, row in df.iterrows():
    if row.B > 0:
        df.loc[index, 'C'] = row.A / row.B
    else:
        df.loc[index, 'C'] = 0

Output:

输出：

   A  B         C
0  3  4  0.750000
1  0  4  0.000000
2  4  3  1.333333
3  2  1  2.000000
4  1  0  0.000000
5  0  2  0.000000

Answer 5

回答by vaishnav krishnan

Or you could just open a for loop.

或者你可以只打开一个 for 循环。

for i,j in df['a'],df['b']:
    if j>0:
        df['c']=i/j
    else:
        df['c']=0.0

使用 if-else 创建新列时的 Pandas 错误：Series 的真值不明确

提问by stackoverflowuser2010

回答by keepAlive

回答by Alexander

回答by YOBEN_S

回答by HeyMan

回答by vaishnav krishnan

相关推荐

最近更新

标签

使用 if-else 创建新列时的 Pandas 错误：Series 的真值不明确

提问by stackoverflowuser2010

回答by keepAlive

回答by Alexander

回答by YOBEN_S

回答by HeyMan

回答by vaishnav krishnan

相关推荐

在 Pandas 中，.iloc 方法是否提供副本或视图？

导入 pandas.io.data

如何使用 Pandas 从 Word 文档 (.docx) 文件中的表格创建数据框

pandas 如何使用 Python 解析复杂的文本文件？

相关推荐

最近更新

标签