Python 如何获得熊猫系列的元素逻辑非?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15998188/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 21:33:01  来源:igfitidea点击:

How can I obtain the element-wise logical NOT of a pandas Series?

pythonpandasboolean-logic

提问by blz

I have a pandas Seriesobject containing boolean values. How can I get a series containing the logical NOTof each value?

我有一个Series包含布尔值的熊猫对象。如何获得包含NOT每个值的逻辑的系列?

For example, consider a series containing:

例如,考虑一个包含以下内容的系列:

True
True
True
False

The series I'd like to get would contain:

我想获得的系列将包含:

False
False
False
True

This seems like it should be reasonably simple, but apparently I've misplaced my mojo =(

这看起来应该相当简单,但显然我把我的魔力放错了地方 =(

采纳答案by unutbu

To invert a boolean Series, use ~s:

要反转布尔系列,请使用~s

In [7]: s = pd.Series([True, True, False, True])

In [8]: ~s
Out[8]: 
0    False
1    False
2     True
3    False
dtype: bool

Using Python2.7, NumPy 1.8.0, Pandas 0.13.1:

使用 Python2.7、NumPy 1.8.0、Pandas 0.13.1:

In [119]: s = pd.Series([True, True, False, True]*10000)

In [10]:  %timeit np.invert(s)
10000 loops, best of 3: 91.8 μs per loop

In [11]: %timeit ~s
10000 loops, best of 3: 73.5 μs per loop

In [12]: %timeit (-s)
10000 loops, best of 3: 73.5 μs per loop

As of Pandas 0.13.0, Series are no longer subclasses of numpy.ndarray; they are now subclasses of pd.NDFrame. This might have something to do with why np.invert(s)is no longer as fast as ~sor -s.

从 Pandas 0.13.0 开始,Series 不再是 的子类numpy.ndarray;它们现在是 的子类pd.NDFrame。这可能与为什么np.invert(s)不再像~sor那样快有关-s

Caveat: timeitresults may vary depending on many factors including hardware, compiler, OS, Python, NumPy and Pandas versions.

警告:timeit结果可能因许多因素而异,包括硬件、编译器、操作系统、Python、NumPy 和 Pandas 版本。

回答by herrfz

I just give it a shot:

我只是试一试:

In [9]: s = Series([True, True, True, False])

In [10]: s
Out[10]: 
0     True
1     True
2     True
3    False

In [11]: -s
Out[11]: 
0    False
1    False
2    False
3     True

回答by root

You can also use numpy.invert:

您还可以使用numpy.invert

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: s = pd.Series([True, True, False, True])

In [4]: np.invert(s)
Out[4]: 
0    False
1    False
2     True
3    False

EDIT: The difference in performance appears on Ubuntu 12.04, Python 2.7, NumPy 1.7.0 - doesn't seem to exist using NumPy 1.6.2 though:

编辑:性能差异出现在 Ubuntu 12.04、Python 2.7、NumPy 1.7.0 上 - 但使用 NumPy 1.6.2 似乎不存在:

In [5]: %timeit (-s)
10000 loops, best of 3: 26.8 us per loop

In [6]: %timeit np.invert(s)
100000 loops, best of 3: 7.85 us per loop

In [7]: %timeit ~s
10000 loops, best of 3: 27.3 us per loop

回答by JSharm

@unutbu's answer is spot on, just wanted to add a warning that your mask needs to be dtype bool, not 'object'. Ie your mask can't have everhad any nan's. See here- even if your mask is nan-free now, it will remain 'object' type.

@unutbu 的答案是正确的,只是想添加一个警告,即您的掩码需要是 dtype bool,而不是“对象”。即你的面具不能有曾经有过任何男的。见这里- 即使你的面具现在是 nan-free,它仍将是“对象”类型。

The inverse of an 'object' series won't throw an error, instead you'll get a garbage mask of ints that won't work as you expect.

“对象”系列的反转不会抛出错误,相反,您会得到一个无法按预期工作的整数垃圾掩码。

In[1]: df = pd.DataFrame({'A':[True, False, np.nan], 'B':[True, False, True]})
In[2]: df.dropna(inplace=True)
In[3]: df['A']
Out[3]:
0    True
1   False
Name: A, dtype object
In[4]: ~df['A']
Out[4]:
0   -2
0   -1
Name: A, dtype object

After speaking with colleagues about this one I have an explanation: It looks like pandas is reverting to the bitwise operator:

在与同事讨论这个问题后,我有一个解释:看起来熊猫正在恢复到按位运算符:

In [1]: ~True
Out[1]: -2

As @geher says, you can convert it to bool with astype before you inverse with ~

正如@geher 所说,在与 ~ 反转之前,您可以使用 astype 将其转换为 bool

~df['A'].astype(bool)
0    False
1     True
Name: A, dtype: bool
(~df['A']).astype(bool)
0    True
1    True
Name: A, dtype: bool

回答by grofte

NumPy is slower because it casts the input to boolean values (so None and 0 becomes False and everything else becomes True).

NumPy 较慢,因为它将输入转换为布尔值(因此 None 和 0 变为 False,其他所有内容变为 True)。

import pandas as pd
import numpy as np
s = pd.Series([True, None, False, True])
np.logical_not(s)

gives you

给你

0    False
1     True
2     True
3    False
dtype: object

whereas ~s would crash. In most cases tilde would be a safer choice than NumPy.

而 ~s 会崩溃。在大多数情况下,波浪号是比 NumPy 更安全的选择。

Pandas 0.25, NumPy 1.17

熊猫 0.25,NumPy 1.17