错误:系列的真值不明确 - Python pandas
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45493948/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Error: The truth value of a Series is ambiguous - Python pandas
提问by i.n.n.m
I know this question has been asked before, however, when I am trying to do an if
statement and I am getting an error. I looked at this link, but did not help much in my case. My dfs
is a list of DataFrames.
我知道以前有人问过这个问题,但是,当我试图做一个if
陈述时,我遇到了错误。我查看了此链接,但对我的情况没有多大帮助。Mydfs
是一个 DataFrame 列表。
I am trying the following,
我正在尝试以下操作,
for i in dfs:
if (i['var1'] < 3.000):
print(i)
Gives the following error:
给出以下错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
ValueError:系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。
ANDI tried the following and getting the same error.
和我尝试以下,并得到同样的错误。
for i,j in enumerate(dfs):
if (j['var1'] < 3.000):
print(i)
My var1
data type is float32
. I am not using any other logical
operators and &
or |
. In the above link it seemed to be because of using logical operators. Why do I get ValueError
?
我的var1
数据类型是float32
. 我没有使用任何其他logical
运算符和&
或|
。在上面的链接中,这似乎是因为使用了逻辑运算符。为什么我得到ValueError
?
采纳答案by MaxU
Here is a small demo, which shows why this is happenning:
这是一个小演示,它说明了为什么会发生这种情况:
In [131]: df = pd.DataFrame(np.random.randint(0,20,(5,2)), columns=list('AB'))
In [132]: df
Out[132]:
A B
0 3 11
1 0 16
2 16 1
3 2 11
4 18 15
In [133]: res = df['A'] > 10
In [134]: res
Out[134]:
0 False
1 False
2 True
3 False
4 True
Name: A, dtype: bool
when we try to check whether such Series is True
- Pandas doesn't know what to do:
当我们尝试检查此类系列是否是True
- Pandas 不知道该怎么做时:
In [135]: if res:
...: print(df)
...:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
...
skipped
...
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Workarounds:
解决方法:
we can decide how to treat Series of boolean values - for example if
should return True
if allvalues are True
:
我们可以决定如何处理布尔值的系列-比如if
应该返回True
,如果所有的值是True
:
In [136]: res.all()
Out[136]: False
or when at least onevalue is True:
或者当至少一个值为 True 时:
In [137]: res.any()
Out[137]: True
In [138]: if res.any():
...: print(df)
...:
A B
0 3 11
1 0 16
2 16 1
3 2 11
4 18 15
回答by Gasvom
Currently, you're selecting the entire series for comparison. To get an individual value from the series, you'll want to use something along the lines of:
目前,您正在选择整个系列进行比较。要从系列中获取单个值,您需要使用以下内容:
for i in dfs:
if (i['var1'].iloc[0] < 3.000):
print(i)
To compare each of the individual elements you can use series.iteritems(documentation is sparse on this one) like so:
要比较每个单独的元素,您可以使用series.iteritems(文档很少),如下所示:
for i in dfs:
for _, v in i['var1'].iteritems():
if v < 3.000:
print(v)
The better solution here for most cases is to select a subset of the dataframe to use for whatever you need, like so:
对于大多数情况,这里更好的解决方案是选择数据帧的一个子集以用于您需要的任何内容,如下所示:
for i in dfs:
subset = i[i['var1'] < 3.000]
# do something with the subset
Performance in pandas is much faster on large dataframes when using series operations instead of iterating over individual values. For more detail, you can check out the pandas documentation on selection.
当使用系列操作而不是迭代单个值时,pandas 在大型数据帧上的性能要快得多。有关更多详细信息,您可以查看有关选择的 Pandas文档。
回答by Shaina Raza
the comparison returns a range of values, you need to limit it either by any() or all(), for example,
比较返回一系列值,您需要通过 any() 或 all() 对其进行限制,例如,
if((df[col] == ' this is any string or list').any()):
return(df.loc[df[col] == temp].index.values.astype(int)[0])