Python 使用逻辑表达式和 if 语句评估 Pandas 系列值

Question

提问by neanderslob

I'm having trouble evaluating values from a dictionary using if statements.

我无法使用 if 语句评估字典中的值。

Given the following dictionary, which I imported from a dataframe (in case it matters):

给定以下字典，这是我从数据框导入的（以防万一）：

>>> pnl[company]
29:   Active Credit       Date   Debit Strike Type
0      1      0 2013-01-08  2.3265  21.15  Put
1      0      0 2012-11-26      40     80  Put
2      0      0 2012-11-26     400     80  Put

I tried to evaluate the following statment to establish the value of the last value of Active:

我尝试评估以下语句以确定的最后一个值的值Active：

if pnl[company].tail(1)['Active']==1:
    print 'yay'

However,I was confronted by the following error message:

但是，我遇到了以下错误消息：

Traceback (most recent call last):
  File "<pyshell#69>", line 1, in <module>
    if pnl[company].tail(1)['Active']==1:
  File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 676, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

This surprised me, given that I could display the value I wanted using the above command without the if statement:

这让我感到惊讶，因为我可以在没有 if 语句的情况下使用上述命令显示我想要的值：

>>> pnl[company].tail(1)['Active']
30: 2    0
Name: Active, dtype: object

Given that the value is clearly zero and the index is 2, I tried the following for a brief sanity check and found that things weren't happening as I might have expected:

鉴于该值显然为零且索引为 2，我尝试了以下内容进行简短的健全性检查，发现事情并没有像我预期的那样发生：

>>> if pnl[company]['Active'][2]==0:
...     print 'woo-hoo'
... else:
...     print 'doh'


doh

My Question is:

我的问题是：

1) What might be going on here? I suspect I'm misunderstanding dictionaries on some fundamental level.

1）这里可能会发生什么？我怀疑我在某些基本层面上误解了字典。

2) I noticed that as I bring up any given value of this dictionary, the number on the left increases by 1. What does this represent? For example:

2) 我注意到当我调出这本字典的任何给定值时，左边的数字增加 1。这代表什么？例如：

>>> pnl[company].tail(1)['Active']
31: 2    0
Name: Active, dtype: object
>>> pnl[company].tail(1)['Active']
32: 2    0
Name: Active, dtype: object
>>> pnl[company].tail(1)['Active']
33: 2    0
Name: Active, dtype: object
>>> pnl[company].tail(1)['Active']
34: 2    0
Name: Active, dtype: object

Thanks in advance for any help.

在此先感谢您的帮助。

Answer 1

采纳答案by EdChum

What you yield is a Pandas Series object and this cannot be evaluated in the manner you are attempting even though it is just a single value you need to change your line to:

您产生的是 Pandas Series 对象，即使它只是您需要将行更改为的单个值，也无法以您尝试的方式对其进行评估：

if pnl[company].tail(1)['Active'].any()==1:
  print 'yay'

With respect to your second question see my comment.

关于你的第二个问题，见我的评论。

EDIT

编辑

From the comments and link to your output, calling any()fixed the error message but your data is actually strings so the comparison still failed, you could either do:

从评论和链接到您的输出，调用any()修复了错误消息，但您的数据实际上是字符串，因此比较仍然失败，您可以这样做：

if pnl[company].tail(1)['Active'].any()=='1':
  print 'yay'

To do a string comparison, or fix the data however it was read or generated.

进行字符串比较，或修复读取或生成的数据。

Or do:

或者做：

pnl['Company']['Active'] = pnl['Company']['Active'].astype(int)

To convert the dtypeof the column so that your comparison is more correct.

转换dtype列的，以便您的比较更正确。

Answer 2

回答by unutbu

A Series is a subclass of NDFrame. The NDFrame.__bool__method always raises a ValueError. Thus, trying to evaluate a Series in a boolean context raises a ValueError -- even if the Series has but a single value.

系列是 NDFrame 的子类。该NDFrame.__bool__方法总是引发 ValueError。因此，尝试在布尔上下文中评估 Series 会引发 ValueError —— 即使 Series 只有一个值。

The reason why NDFrames have no boolean value (err, that is, always raise a ValueError), is because there is more than one possible criterion that one might reasonably expect for an NDFrame to be True. It could mean

NDFrames 没有布尔值的原因（错误，即总是引发 ValueError），是因为有多个可能的标准，人们可能合理地期望 NDFrame 为 True。这可能意味着

every item in the NDFrame is True, or (if so, use .all())
any item in the NDFrame is True, or (if so, use Series.any())
the NDFrame is not empty (if so, use .empty())

NDFrame 中的每一项都为 True，或者（如果是，请使用.all()）
NDFrame 中的任何项目为 True，或（如果是，请使用Series.any()）
NDFrame 不为空（如果是，请使用.empty()）

Since either is possible, and since different users have different expectations, instead of just choosing one, the developers refuse to guess and instead require the user of the NDFrame to make explicit what criterion they wish to use.

由于任何一种都是可能的，并且由于不同的用户有不同的期望，因此开发人员拒绝猜测，而是要求 NDFrame 的用户明确他们希望使用的标准，而不是仅仅选择一个。

The error message lists the most likely choices:

错误消息列出了最可能的选择：

Use a.empty, a.bool(), a.item(), a.any() or a.all()

使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()

Since in your case you know the Series will contain just one value, you could use item:

由于在您的情况下您知道系列将只包含一个值，您可以使用item：

if pnl[company].tail(1)['Active'].item() == 1:
    print 'yay'

Regarding your second question: The numbers on the left seem to be line numbering produced by your Python interpreter (PyShell?) -- but that's just my guess.

关于你的第二个问题：左边的数字似乎是你的 Python 解释器（PyShell？）生成的行号——但这只是我的猜测。

WARNING:Presumably,

警告：大概，

if pnl[company].tail(1)['Active']==1:

means you would like the condition to be True when the single value in the Series equals 1. The code

意味着当系列中的单个值等于 1 时，您希望条件为 True。代码

if pnl[company].tail(1)['Active'].any()==1:
    print 'yay'

will be True if the dtype of the Series is numeric and the value in the Series is any numberother than 0. For example, if we take pnl[company].tail(1)['Active']to be equal to

如果系列的 dtype 是数字并且系列中的值是0 以外的任何数字，则将为 True 。例如，如果我们取pnl[company].tail(1)['Active']等于

In [128]: s = pd.Series([2], index=[2])

then

然后

In [129]: s.any()
Out[129]: True

and therefore,

因此，

In [130]: s.any()==1
Out[130]: True

I think s.item() == 1more faithfully preserves your intended meaning:

我认为s.item() == 1更忠实地保留了您的预期含义：

In [132]: s.item()==1
Out[132]: False

(s == 1).any()would also work, but using anydoes not express your intention very plainly, since you know the Series will contain only one value.

(s == 1).any()也可以，但是 usingany并不能很清楚地表达您的意图，因为您知道 Series 将只包含一个值。

Answer 3

回答by smci

Your question has nothing to do with Python dictionaries, or native Python at all. It's about pandas Series, and the other answers gave you the correct syntax:

您的问题与 Python 词典或本机 Python 根本无关。这是关于熊猫系列，其他答案为您提供了正确的语法：

Interpreting your questions in the wider sense, it's about how pandas Serieswas shoehorned onto NumPy, and NumPy historically until recently had notoriously poor support for logical values and operators. pandas does the best job it can with what NumPy provides. Having to sometimes manually invoke numpy logical functions instead of just writing code with arbitrary (Python) operators is annoying and clunky and sometimes bloats pandas code. Also, you often have to this for performance (numpy better than thunking to and from native Python). But that's the price we pay.

从更广泛的意义上解释您的问题，它是关于如何pandas Series硬塞到NumPy上的NumPy，直到最近，NumPy 一直对逻辑值和运算符的支持非常差。pandas 使用 NumPy 提供的功能尽其所能。有时必须手动调用 numpy 逻辑函数，而不是仅使用任意 (Python) 运算符编写代码，这既烦人又笨拙，有时会膨胀 Pandas 代码。此外，为了提高性能，您经常需要这样做（numpy 比与本机 Python 之间的 thunk 更好）。但这就是我们付出的代价。

There are many limitations, quirks and gotchas (examples below) - the best advice is to be distrustful of boolean as a first-class-citizen in pandas due to numpy's limitations:

有很多限制、怪癖和陷阱（下面的例子）——由于 numpy 的限制，最好的建议是不要相信 boolean 作为熊猫中的一等公民：

pandas Caveats and Gotchas - Using If/Truth Statements with Pandas
a performance example: Python ~ can be used instead of np.invert() - more legible but 3x slower or worse
some gotchas and limitations: in the code below, note that recent numpy now allows boolean values (internally represented as int) and allows NAs, but that e.g. value_counts()ignores NAs (compare to R's table, which has option 'useNA').

pandas 的注意事项和陷阱 - 在 Pandas 中使用 If/Truth 语句
性能示例：可以使用 Python ~ 代替 np.invert() - 更清晰但慢 3 倍或更糟
一些问题和限制：在下面的代码中，请注意最近的 numpy 现在允许布尔值（内部表示为 int）并允许 NA，但例如value_counts()忽略 NA（与R 的表相比，它具有选项 'useNA'）。

.

import numpy as np
import pandas as pd
s = pd.Series([True, True, False, True, np.NaN])
s2  = pd.Series([True, True, False, True, np.NaN])
dir(s) # look at .all, .any, .bool, .eq, .equals, .invert, .isnull, .value_counts() ...

s.astype(bool) # WRONG: should use the member s.bool ; no parentheses, it's a member, not a function
# 0     True
# 1     True
# 2    False
# 3     True
# 4     True  # <--- should be NA!!
#dtype: bool

s.bool
# <bound method Series.bool of
# 0     True
# 1     True
# 2    False
# 3     True
# 4      NaN
# dtype: object>

# Limitation: value_counts() currently excludes NAs
s.value_counts()
# True     3
# False    1
# dtype: int64
help(s.value_counts) # "... Excludes NA values(!)"

# Equality comparison - vector - fails on NAs, again there's no NA-handling option):
s == s2 # or equivalently, s.eq(s2)
# 0     True
# 1     True
# 2     True
# 3     True
# 4    False  # BUG/LIMITATION: we should be able to choose NA==NA
# dtype: bool

# ...but the scalar equality comparison says they are equal!!
s.equals(s2)
# True

Python 使用逻辑表达式和 if 语句评估 Pandas 系列值

提问by neanderslob

采纳答案by EdChum

回答by unutbu

回答by smci

相关推荐

最近更新

标签

Python 使用逻辑表达式和 if 语句评估 Pandas 系列值

提问by neanderslob

采纳答案by EdChum

回答by unutbu

回答by smci

相关推荐

如何使用 Python 通过 SSL 连接到远程 PostgreSQL 数据库

python将浮点数列表转换为字符串

Python 如何通过字典进行搜索？

Python 整数的Django URL模式

相关推荐

最近更新

标签