Python 熊猫:索引数据框时的多个条件 - 意外行为
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22591174/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas: multiple conditions while indexing data frame - unexpected behavior
提问by Wojciech Walczak
I am filtering rows in a dataframe by values in two columns.
我正在按两列中的值过滤数据框中的行。
For some reason the OR operator behaves like I would expect AND operator to behave and vice versa.
出于某种原因, OR 运算符的行为就像我希望 AND 运算符的行为一样,反之亦然。
My test code:
我的测试代码:
import pandas as pd
df = pd.DataFrame({'a': range(5), 'b': range(5) })
# let's insert some -1 values
df['a'][1] = -1
df['b'][1] = -1
df['a'][3] = -1
df['b'][4] = -1
df1 = df[(df.a != -1) & (df.b != -1)]
df2 = df[(df.a != -1) | (df.b != -1)]
print pd.concat([df, df1, df2], axis=1,
keys = [ 'original df', 'using AND (&)', 'using OR (|)',])
And the result:
结果:
original df using AND (&) using OR (|)
a b a b a b
0 0 0 0 0 0 0
1 -1 -1 NaN NaN NaN NaN
2 2 2 2 2 2 2
3 -1 3 NaN NaN -1 3
4 4 -1 NaN NaN 4 -1
[5 rows x 6 columns]
As you can see, the ANDoperator drops every row in which at least one value equals -1. On the other hand, the ORoperator requires both values to be equal to -1to drop them. I would expect exactly the opposite result. Could anyone explain this behavior, please?
如您所见,AND运算符删除至少一个值等于 的每一行-1。另一方面,OR运算符要求两个值相等-1才能删除它们。我期望完全相反的结果。任何人都可以解释这种行为吗?
I am using pandas 0.13.1.
我正在使用熊猫 0.13.1。
采纳答案by DSM
As you can see, the AND operator drops every row in which at least one value equals -1. On the other hand, the OR operator requires both values to be equal to -1 to drop them.
如您所见,AND 运算符删除至少一个值等于 -1 的每一行。另一方面,OR 运算符要求两个值都等于 -1 才能删除它们。
That's right. Remember that you're writing the condition in terms of what you want to keep, not in terms of what you want to drop. For df1:
这是正确的。请记住,您是根据要保留的内容而不是要删除的内容来编写条件的。对于df1:
df1 = df[(df.a != -1) & (df.b != -1)]
You're saying "keep the rows in which df.aisn't -1 and df.bisn't -1", which is the same as dropping every row in which at least one value is -1.
您说的是“保留df.a不是 -1 且df.b不是 -1 的行”,这与删除至少一个值为 -1 的每一行相同。
For df2:
对于df2:
df2 = df[(df.a != -1) | (df.b != -1)]
You're saying "keep the rows in which either df.aor df.bis not -1", which is the same as dropping rows where both values are -1.
您说的是“保留其中一个df.a或df.b不是 -1 的行”,这与删除两个值都为 -1 的行相同。
PS: chained access like df['a'][1] = -1can get you into trouble. It's better to get into the habit of using .locand .iloc.
PS:链式访问之类的df['a'][1] = -1会让你陷入困境。最好养成使用.locand的习惯.iloc。
回答by Jake
A little mathematical logic theoryhere:
这里有一点数理逻辑理论:
"NOT a AND NOT b"is the same as "NOT (a OR b)", so:
"NOT a AND NOT b"与"NOT (a OR b)" 相同,所以:
"a NOT -1 AND b NOT -1"is equivalent of "NOT (a is -1 OR b is -1)", which is opposite (Complement) of "(a is -1 OR b is -1)".
"a NOT -1 AND b NOT -1"等价于 "NOT (a is -1 OR b is -1)",与"(a is -1 OR b is -1)"相反(补)。
So if you want exact opposite result, df1 and df2 should be as below:
因此,如果您想要完全相反的结果,则 df1 和 df2 应如下所示:
df1 = df[(df.a != -1) & (df.b != -1)]
df2 = df[(df.a == -1) | (df.b == -1)]

