pandas 熊猫数据帧中的逻辑或/按位或

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39388950/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:58:31  来源:igfitidea点击:

Logical Or/bitwise OR in pandas Data Frame

pythonpandasbitwise-operatorslogical-operators

提问by BernardL

I am trying to use a Boolean mask to get a match from 2 different dataframes. U

我正在尝试使用布尔掩码从 2 个不同的数据帧中获取匹配项。你

Using the logical OR operator:

使用逻辑 OR 运算符:

x = df[(df['A'].isin(df2['B']))
      or df['A'].isin(df2['C'])]

Output:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

However using the bitwise OR operator, the results are returned successfully.

但是使用按位 OR 运算符,结果会成功返回。

x = df[(df['A'].isin(df2['B']))
      | df['A'].isin(df2['C'])]

Output: x

Is there a difference in both and would bitwise OR be the best option here? Why doesn't the logical OR work?

两者有区别吗,按位 OR 是这里的最佳选择吗?为什么逻辑 OR 不起作用?

回答by nathan_lesage

As far as I have come to understand this issue (coming from a C++ background and currently learning Python for data sciences) I stumbled upon several posts suggesting that bitwise operators (&, |) can be overloaded in classes, just like C++ does.

就我对这个问题的理解(来自 C++ 背景,目前正在学习用于数据科学的 Python)我偶然发现了几篇文章,建议按位运算符 (&, |) 可以在类中重载,就像 C++ 一样。

So basically, while you may use such bitwise operators on numbers they will compare the bits and give you the result. So for instance, if you have the following:

所以基本上,虽然您可以在数字上使用这样的按位运算符,但它们会比较位并为您提供结果。因此,例如,如果您有以下内容:

1 | 2 # will result in 3

1 | 2 # 将导致 3

What Python will actually do is compare the bits of these numbers:

Python 实际上会做的是比较这些数字的位:

00000001 | 00000010

00000001 | 00000010

The result will be:

结果将是:

00000011 (because 0 | 0 is False, ergo 0; and 0 | 1 is True, ergo 1)

00000011(因为 0 | 0 为假,因此为 0;而 0 | 1 为真,因此为 1)

As an integer: 3

作为整数:3

It compares each bit of the numbers and spit out the result of these eight consecutive operations. This is the normal behaviour of these operators.

它比较数字的每一位并吐出这八个连续操作的结果。这是这些运营商的正常行为。

Enter Pandas. As you can overload these operators, Pandas has made use of this. So what bitwise operators do when coming to pandas dataframes, is the following:

进入Pandas。由于您可以重载这些运算符,Pandas 已经利用了这一点。因此,在使用 Pandas 数据帧时,按位运算符的作用如下:

(dataframe1['column'] == "expression") & (dataframe1['column'] != "another expression)

(dataframe1['column'] == "表达式") & (dataframe1['column'] != "另一个表达式)

In this case, first pandas will create a series of trues or falses depending on the result of the == and != operations (be careful: you have to put braces around the outer expressions because python will always try to resolve first bitwise operators and THEN the other comparision operators!!). So it will compare each value in the column to the expression and either output a true or a false.

在这种情况下,第一个 Pandas 将根据 == 和 != 操作的结果创建一系列真或假(注意:你必须在外部表达式周围放置大括号,因为 python 将始终尝试解析第一个按位运算符和然后是其他比较运算符!!)。因此,它将列中的每个值与表达式进行比较,并输出真或假。

Then you'd have two same-length series of trues and falses. What it THEN does is take these two serieses and basically compare them with either "and" (&) or "or" (|), and finally spit out one single series either fulfilling or not fulfilling all three comparision operations.

然后你会有两个相同长度的真假序列。THEN 所做的就是将这两个序列与“与”(&) 或“或”(|) 进行比较,最后吐出一个序列,要么满足要么不满足所有三个比较操作。

To go even further, what I think is happening under the hood is that the &-operator actually calls a function of pandas, gives them both previously evaluated operations (so the two serieses to the left and right of the operator) and pandas then compares two distinct values at a time, returning a True or False depending on the internal mechanism to determine this.

更进一步,我认为在幕后发生的事情是 &-operator 实际上调用了 pandas 的一个函数,给它们两个先前评估过的操作(所以操作符左边和右边的两个系列)然后 pandas 进行比较一次两个不同的值,返回 True 或 False 取决于确定这一点的内部机制。

This is basically the same principle they've used for all other operators as well (>, <, >=, <=, ==, !=).

这与他们用于所有其他运算符(>、<、>=、<=、==、!=)的原则基本相同。

Why do the struggle and use a different &-expression when you got the nice and neat "and"? Well, that seems to be because "and" is just hard coded and cannot be altered manually.

当你得到漂亮而整洁的“and”时,为什么要挣扎并使用不同的 & 表达?嗯,这似乎是因为“和”只是硬编码,不能手动更改。

Hope that helps!

希望有帮助!