pandas 系列的真值在数据框中不明确
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45811610/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
The truth value of a Series is ambiguous in dataframe
提问by DYZ
I have the same code,I'm trying to create new field in pandas dataframe with simple conditions:
我有相同的代码,我正在尝试使用简单的条件在 Pandas 数据框中创建新字段:
if df_reader['email1_b']=='NaN':
df_reader['email1_fin']=df_reader['email1_a']
else:
df_reader['email1_fin']=df_reader['email1_b']
But I see this strange mistake:
但我看到了这个奇怪的错误:
ValueError Traceback (most recent call last)
<ipython-input-92-46d604271768> in <module>()
----> 1 if df_reader['email1_b']=='NaN':
2 df_reader['email1_fin']=df_reader['email1_a']
3 else:
4 df_reader['email1_fin']=df_reader['email1_b']
/home/user/GL-env_py-gcc4.8.5/lib/python2.7/site-packages/pandas/core/generic.pyc in __nonzero__(self)
953 raise ValueError("The truth value of a {0} is ambiguous. "
954 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 955 .format(self.__class__.__name__))
956
957 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Can anybody explain me, what I need to with this?
任何人都可以解释我,我需要什么?
回答by DYZ
df_reader['email1_b']=='NaN'
is a vector of Boolean values (one per row), but you need one Boolean value for if
to work. Use this instead:
df_reader['email1_b']=='NaN'
是一个布尔值向量(每行一个),但您需要一个布尔值if
才能工作。改用这个:
df_reader['email1_fin'] = np.where(df_reader['email1_b']=='NaN',
df_reader['email1_a'],
df_reader['email1_b'])
As a side note, are you sure about 'NaN'
? Is it not NaN
? In the latter case, your expression should be:
作为旁注,您确定'NaN'
吗?不是NaN
吗?在后一种情况下,您的表达式应该是:
df_reader['email1_fin'] = np.where(df_reader['email1_b'].isnull(),
df_reader['email1_a'],
df_reader['email1_b'])
回答by EdChum
if
expects a scalar value to be returned, it doesn't understand an array of booleans which is what is returned by your conditions. If you think about it what should it do if a single value in this array is False
/True
?
if
期望返回一个标量值,它不理解布尔数组,这是您的条件返回的内容。如果您考虑一下,如果此数组中的单个值是False
/ 该True
怎么办?
to do this properly you can do the following:
要正确执行此操作,您可以执行以下操作:
df_reader['email1_fin'] = np.where(df_reader['email1_b'] == 'NaN', df_reader['email1_a'], df_reader['email1_b'] )
also you seem to be comparing against the str
'NaN'
rather than the numerical NaN
is this intended?
您似乎也是在比较str
'NaN'
而不是数字NaN
,这是打算吗?