使用 if-else 创建新列时的 Pandas 错误:Series 的真值不明确
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48123368/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas error when using if-else to create new column: The truth value of a Series is ambiguous
提问by stackoverflowuser2010
I'm using Pandas and am trying to create a new column using a Python if-else statement (aka ternary condition operator) in order to avoid division by zero.
我正在使用 Pandas 并尝试使用 Python if-else 语句(又名三元条件运算符)创建一个新列,以避免被零除。
For example below, I want to create a new column C by dividing A/B. I want to use the if-else statement to avoid dividing by 0.
例如下面,我想通过划分 A/B 创建一个新列 C。我想使用 if-else 语句来避免除以 0。
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0, 5, size=(100, 2)), columns=list('AB'))
df.head()
# A B
# 0 1 3
# 1 1 2
# 2 0 0
# 3 2 1
# 4 4 2
df['C'] = (df.A / df.B) if df.B > 0.0 else 0.0
However, I am getting an error from the last line:
但是,我从最后一行收到错误消息:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I searched on StackOverflow and found other posts about this error, but none of them involved this type of if-else statement. Some posts include:
我在 StackOverflow 上搜索并找到了有关此错误的其他帖子,但没有一个涉及这种类型的 if-else 语句。一些帖子包括:
Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()
The truth value of a Series is ambiguous in dataframe
Error: The truth value of a Series is ambiguous - Python pandas
Any help would be appreciated.
任何帮助,将不胜感激。
回答by keepAlive
What about doing
怎么办
>>> df['C'] = np.where(df.B>0., df.A/df.B, 0.)
which reads as :
读作:
where
df.B
is strictly positive, returndf.A/df.B
, otherwise return0.
其中
df.B
严格为正,返回df.A/df.B
,否则返回0.
回答by Alexander
df.B > 0
results in a Series, e.g.:
df.B > 0
结果是一个系列,例如:
0 True # 4 > 0 => True
1 True # 2 > 0 => True
2 True # ...
3 True
4 True
5 True
6 True
7 True
8 False # 0 is not > 0 => False
9 False # 0 is not > 0 => False
...
Multiple values are returned which results in ambiguity (some are True while others are False).
返回的多个值会导致歧义(有些为 True,有些为 False)。
One solution is to use np.where
:
一种解决方案是使用np.where
:
sentinel = np.nan # Or 0 if you must...
df = df.assign(C=np.where(df['B'] != 0, df['A'] / df['B'], sentinel))
>>> df
A B C
0 2 4 0.5
1 0 2 0.0
2 1 2 0.5
3 4 4 1.0
4 1 1 1.0
5 4 4 1.0
6 2 4 0.5
7 1 2 0.5
8 4 0 NaN # NaN is assigned in cases where the value in Column `B` is zero.
9 1 0 NaN
...
回答by YOBEN_S
df['C']=df.A.div(df.B.mask(df.B.lt(0),0)).fillna(0)
df
Out[89]:
A B C
0 1 3 0.333333
1 1 2 0.500000
2 0 0 0.000000
3 2 1 2.000000
4 4 2 2.000000
With apply lambda
使用 lambda
df['C']=df.apply(lambda x : x['A']/x['B'] if x['B']>0 else 0,1)
df
Out[93]:
A B C
0 1 3 0.333333
1 1 2 0.500000
2 0 0 0.000000
3 2 1 2.000000
4 4 2 2.000000
回答by HeyMan
Based on @vaishnav proposal above on iterating over the dataframe here is a working proposal:
基于上面关于迭代数据框的@vaishnav 提议,这里是一个工作提议:
for index, row in df.iterrows():
if row.B > 0:
df.loc[index, 'C'] = row.A / row.B
else:
df.loc[index, 'C'] = 0
Output:
输出:
A B C
0 3 4 0.750000
1 0 4 0.000000
2 4 3 1.333333
3 2 1 2.000000
4 1 0 0.000000
5 0 2 0.000000
回答by vaishnav krishnan
Or you could just open a for loop.
或者你可以只打开一个 for 循环。
for i,j in df['a'],df['b']:
if j>0:
df['c']=i/j
else:
df['c']=0.0