Python-pandas:一个系列的真值不明确

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/53830081/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:13:19  来源:igfitidea点击:

Python-pandas: the truth value of a series is ambiguous

pythonpandas

提问by Viktor.w

I am currently trying to compare values from a json file(on which I can already work on) to values from a csv file(which might be the issue). My current code looks like this:

我目前正在尝试将 json 文件(我已经可以处理)中的值与 csv 文件中的值(这可能是问题所在)进行比较。我当前的代码如下所示:

for data in trades['timestamp']:
    data = pd.to_datetime(data)
    print(data)
       if data == ask_minute['lastUpdated']:
           #....'do something'

Which gives:

这使:

":The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."

":Series 的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。"

My current print(data)looks like this:

我的当前print(data)看起来像这样:

2018-10-03 18:03:38.067000
2018-10-03 18:03:38.109000
2018-10-03 18:04:28
2018-10-03 18:04:28.685000

However, I am still unable to compare these timestamps from my CSV file to those of my Json file. Does someone have an idea?

但是,我仍然无法将 CSV 文件中的这些时间戳与 Json 文件中的时间戳进行比较。有人有想法吗?

回答by yatu

Let's reduce it to a simpler example. By doing for instance the following comparison:

让我们将其简化为一个更简单的示例。例如,通过进行以下比较:

3 == pd.Series([3,2,4,1])

0     True
1    False
2    False
3    False
dtype: bool

The result you get is a Seriesof booleans, equal in size to the pd.Seriesin the right hand side of the expression. So really what's happening here is that the integer is being broadcastacross the series, and then they are compared. So when you do:

你得到的结果是一个Series布尔值,大小等于pd.Series表达式右侧的 。所以这里真正发生的是整数在整个系列中广播,然后它们被比较。所以当你这样做时:

if 3 == pd.Series([3,2,4,1]):
    pass

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

ValueError:系列的真值不明确。使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()。

You get an error. The problem here is that you are comparing a pd.Serieswith a value, so you'll have multiple Trueand multiple Falsevalues, as in the case above. This of course is ambiguous, since the condition is neither Trueor False.

你得到一个错误。这里的问题是您将 apd.Series与一个值进行比较,因此您将拥有多个True和多个False值,如上例所示。这当然是模棱两可的,因为条件不是TrueFalse

So you need to further aggregate the result so that a single booleanvalue results from the operation. For that you'll have to use either anyor alldepending on whether you want at least one (any) or allvalues to satisfy the condition.

因此,您需要进一步聚合结果,以便操作产生单个布尔值。为此,您必须使用anyall取决于您是否需要至少一个 ( any) 或all值来满足条件。

(3 == pd.Series([3,2,4,1])).all()
# False

or

或者

(3 == pd.Series([3,2,4,1])).any()
# True

回答by Charles Rogers

The problem I see is that even if you are evaluating one row in a dataframe, the code knows that a dataframe has the ability to have many rows. The code doesn't just assume you want the only row that exists. You have to tell it explicitly. The way I solved it was like this:

我看到的问题是,即使您正在评估数据帧中的一行,代码也知道数据帧具有多行的能力。该代码不仅假设您想要唯一存在的行。你必须明确地告诉它。我解决的方法是这样的:

if data.iloc[0] == ask_minute['lastUpdated']:

then the code knows you are selecting the one row that exists.

然后代码知道您正在选择存在的一行。