Python 使用逻辑表达式和 if 语句评估 Pandas 系列值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23461502/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Evaluating pandas series values with logical expressions and if-statements
提问by neanderslob
I'm having trouble evaluating values from a dictionary using if statements.
我无法使用 if 语句评估字典中的值。
Given the following dictionary, which I imported from a dataframe (in case it matters):
给定以下字典,这是我从数据框导入的(以防万一):
>>> pnl[company]
29: Active Credit Date Debit Strike Type
0 1 0 2013-01-08 2.3265 21.15 Put
1 0 0 2012-11-26 40 80 Put
2 0 0 2012-11-26 400 80 Put
I tried to evaluate the following statment to establish the value of the last value of Active
:
我尝试评估以下语句以确定 的最后一个值的值Active
:
if pnl[company].tail(1)['Active']==1:
print 'yay'
However,I was confronted by the following error message:
但是,我遇到了以下错误消息:
Traceback (most recent call last):
File "<pyshell#69>", line 1, in <module>
if pnl[company].tail(1)['Active']==1:
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 676, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
This surprised me, given that I could display the value I wanted using the above command without the if statement:
这让我感到惊讶,因为我可以在没有 if 语句的情况下使用上述命令显示我想要的值:
>>> pnl[company].tail(1)['Active']
30: 2 0
Name: Active, dtype: object
Given that the value is clearly zero and the index is 2, I tried the following for a brief sanity check and found that things weren't happening as I might have expected:
鉴于该值显然为零且索引为 2,我尝试了以下内容进行简短的健全性检查,发现事情并没有像我预期的那样发生:
>>> if pnl[company]['Active'][2]==0:
... print 'woo-hoo'
... else:
... print 'doh'
doh
My Question is:
我的问题是:
1) What might be going on here? I suspect I'm misunderstanding dictionaries on some fundamental level.
1)这里可能会发生什么?我怀疑我在某些基本层面上误解了字典。
2) I noticed that as I bring up any given value of this dictionary, the number on the left increases by 1. What does this represent? For example:
2) 我注意到当我调出这本字典的任何给定值时,左边的数字增加 1。这代表什么?例如:
>>> pnl[company].tail(1)['Active']
31: 2 0
Name: Active, dtype: object
>>> pnl[company].tail(1)['Active']
32: 2 0
Name: Active, dtype: object
>>> pnl[company].tail(1)['Active']
33: 2 0
Name: Active, dtype: object
>>> pnl[company].tail(1)['Active']
34: 2 0
Name: Active, dtype: object
Thanks in advance for any help.
在此先感谢您的帮助。
采纳答案by EdChum
What you yield is a Pandas Series object and this cannot be evaluated in the manner you are attempting even though it is just a single value you need to change your line to:
您产生的是 Pandas Series 对象,即使它只是您需要将行更改为的单个值,也无法以您尝试的方式对其进行评估:
if pnl[company].tail(1)['Active'].any()==1:
print 'yay'
With respect to your second question see my comment.
关于你的第二个问题,见我的评论。
EDIT
编辑
From the comments and link to your output, calling any()
fixed the error message but your data is actually strings so the comparison still failed, you could either do:
从评论和链接到您的输出,调用any()
修复了错误消息,但您的数据实际上是字符串,因此比较仍然失败,您可以这样做:
if pnl[company].tail(1)['Active'].any()=='1':
print 'yay'
To do a string comparison, or fix the data however it was read or generated.
进行字符串比较,或修复读取或生成的数据。
Or do:
或者做:
pnl['Company']['Active'] = pnl['Company']['Active'].astype(int)
To convert the dtype
of the column so that your comparison is more correct.
转换dtype
列的 ,以便您的比较更正确。
回答by unutbu
A Series is a subclass of NDFrame. The NDFrame.__bool__
method always raises a ValueError. Thus, trying to evaluate a Series in a boolean context raises a ValueError -- even if the Series has but a single value.
系列是 NDFrame 的子类。该NDFrame.__bool__
方法总是引发 ValueError。因此,尝试在布尔上下文中评估 Series 会引发 ValueError —— 即使 Series 只有一个值。
The reason why NDFrames have no boolean value (err, that is, always raise a ValueError), is because there is more than one possible criterion that one might reasonably expect for an NDFrame to be True. It could mean
NDFrames 没有布尔值的原因(错误,即总是引发 ValueError),是因为有多个可能的标准,人们可能合理地期望 NDFrame 为 True。这可能意味着
- every item in the NDFrame is True, or (if so, use
.all()
) - any item in the NDFrame is True, or (if so, use
Series.any()
) - the NDFrame is not empty (if so, use
.empty()
)
- NDFrame 中的每一项都为 True,或者(如果是,请使用
.all()
) - NDFrame 中的任何项目为 True,或(如果是,请使用
Series.any()
) - NDFrame 不为空(如果是,请使用
.empty()
)
Since either is possible, and since different users have different expectations, instead of just choosing one, the developers refuse to guess and instead require the user of the NDFrame to make explicit what criterion they wish to use.
由于任何一种都是可能的,并且由于不同的用户有不同的期望,因此开发人员拒绝猜测,而是要求 NDFrame 的用户明确他们希望使用的标准,而不是仅仅选择一个。
The error message lists the most likely choices:
错误消息列出了最可能的选择:
Use a.empty, a.bool(), a.item(), a.any() or a.all()
使用 a.empty、a.bool()、a.item()、a.any() 或 a.all()
Since in your case you know the Series will contain just one value, you could use item
:
由于在您的情况下您知道系列将只包含一个值,您可以使用item
:
if pnl[company].tail(1)['Active'].item() == 1:
print 'yay'
Regarding your second question: The numbers on the left seem to be line numbering produced by your Python interpreter (PyShell?) -- but that's just my guess.
关于你的第二个问题:左边的数字似乎是你的 Python 解释器(PyShell?)生成的行号——但这只是我的猜测。
WARNING:Presumably,
警告:大概,
if pnl[company].tail(1)['Active']==1:
means you would like the condition to be True when the single value in the Series equals 1. The code
意味着当系列中的单个值等于 1 时,您希望条件为 True。代码
if pnl[company].tail(1)['Active'].any()==1:
print 'yay'
will be True if the dtype of the Series is numeric and the value in the Series is any numberother than 0. For example, if we take pnl[company].tail(1)['Active']
to be equal to
如果系列的 dtype 是数字并且系列中的值是0 以外的任何数字,则将为 True 。例如,如果我们取pnl[company].tail(1)['Active']
等于
In [128]: s = pd.Series([2], index=[2])
then
然后
In [129]: s.any()
Out[129]: True
and therefore,
因此,
In [130]: s.any()==1
Out[130]: True
I think s.item() == 1
more faithfully preserves your intended meaning:
我认为s.item() == 1
更忠实地保留了您的预期含义:
In [132]: s.item()==1
Out[132]: False
(s == 1).any()
would also work, but using any
does not express your intention very plainly, since you know the Series will contain only one value.
(s == 1).any()
也可以,但是 usingany
并不能很清楚地表达您的意图,因为您知道 Series 将只包含一个值。
回答by smci
Your question has nothing to do with Python dictionaries, or native Python at all. It's about pandas Series, and the other answers gave you the correct syntax:
您的问题与 Python 词典或本机 Python 根本无关。这是关于熊猫系列,其他答案为您提供了正确的语法:
Interpreting your questions in the wider sense, it's about how pandas Series
was shoehorned onto NumPy
, and NumPy historically until recently had notoriously poor support for logical values and operators. pandas does the best job it can with what NumPy provides. Having to sometimes manually invoke numpy logical functions instead of just writing code with arbitrary (Python) operators is annoying and clunky and sometimes bloats pandas code. Also, you often have to this for performance (numpy better than thunking to and from native Python). But that's the price we pay.
从更广泛的意义上解释您的问题,它是关于如何pandas Series
硬塞到NumPy上的NumPy
,直到最近,NumPy 一直对逻辑值和运算符的支持非常差。pandas 使用 NumPy 提供的功能尽其所能。有时必须手动调用 numpy 逻辑函数,而不是仅使用任意 (Python) 运算符编写代码,这既烦人又笨拙,有时会膨胀 Pandas 代码。此外,为了提高性能,您经常需要这样做(numpy 比与本机 Python 之间的 thunk 更好)。但这就是我们付出的代价。
There are many limitations, quirks and gotchas (examples below) - the best advice is to be distrustful of boolean as a first-class-citizen in pandas due to numpy's limitations:
有很多限制、怪癖和陷阱(下面的例子)——由于 numpy 的限制,最好的建议是不要相信 boolean 作为熊猫中的一等公民:
pandas Caveats and Gotchas - Using If/Truth Statements with Pandas
a performance example: Python ~ can be used instead of np.invert() - more legible but 3x slower or worse
some gotchas and limitations: in the code below, note that recent numpy now allows boolean values (internally represented as int) and allows NAs, but that e.g.
value_counts()
ignores NAs (compare to R's table, which has option 'useNA').
一些问题和限制:在下面的代码中,请注意最近的 numpy 现在允许布尔值(内部表示为 int)并允许 NA,但例如
value_counts()
忽略 NA(与R 的表相比,它具有选项 'useNA')。
.
.
import numpy as np
import pandas as pd
s = pd.Series([True, True, False, True, np.NaN])
s2 = pd.Series([True, True, False, True, np.NaN])
dir(s) # look at .all, .any, .bool, .eq, .equals, .invert, .isnull, .value_counts() ...
s.astype(bool) # WRONG: should use the member s.bool ; no parentheses, it's a member, not a function
# 0 True
# 1 True
# 2 False
# 3 True
# 4 True # <--- should be NA!!
#dtype: bool
s.bool
# <bound method Series.bool of
# 0 True
# 1 True
# 2 False
# 3 True
# 4 NaN
# dtype: object>
# Limitation: value_counts() currently excludes NAs
s.value_counts()
# True 3
# False 1
# dtype: int64
help(s.value_counts) # "... Excludes NA values(!)"
# Equality comparison - vector - fails on NAs, again there's no NA-handling option):
s == s2 # or equivalently, s.eq(s2)
# 0 True
# 1 True
# 2 True
# 3 True
# 4 False # BUG/LIMITATION: we should be able to choose NA==NA
# dtype: bool
# ...but the scalar equality comparison says they are equal!!
s.equals(s2)
# True