Python FutureWarning：元素比较失败；返回标量，但将来会执行元素比较

Question

提问by Eric Leschinski

I am using Pandas 0.19.1on Python 3. I am getting a warning on these lines of code. I'm trying to get a list that contains all the row numbers where string Peteris present at column Unnamed: 5.

我0.19.1在 Python 3 上使用 Pandas 。我在这些代码行上收到警告。我正在尝试获取一个列表，其中包含列中Peter存在字符串的所有行号Unnamed: 5。

df = pd.read_excel(xls_path)
myRows = df[df['Unnamed: 5'] == 'Peter'].index.tolist()

It produces a Warning:

它产生警告：

"\Python36\lib\site-packages\pandas\core\ops.py:792: FutureWarning: elementwise 
comparison failed; returning scalar, but in the future will perform 
elementwise comparison 
result = getattr(x, name)(y)"

What is this FutureWarning and should I ignore it since it seems to work.

这个 FutureWarning 是什么，我应该忽略它，因为它似乎有效。

Answer 1

回答by Eric Leschinski

This FutureWarning isn't from Pandas, it is from numpy and the bug also affects matplotlib and others, here's how to reproduce the warning nearer to the source of the trouble:

这个 FutureWarning 不是来自 Pandas，而是来自 numpy 并且该错误也会影响 matplotlib 和其他人，这里是如何在更接近问题根源的地方重现警告：

import numpy as np
print(np.__version__)   # Numpy version '1.12.0'
'x' in np.arange(5)       #Future warning thrown here

FutureWarning: elementwise comparison failed; returning scalar instead, but in the 
future will perform elementwise comparison
False

Another way to reproduce this bug using the double equals operator:

使用双等于运算符重现此错误的另一种方法：

import numpy as np
np.arange(5) == np.arange(5).astype(str)    #FutureWarning thrown here

An example of Matplotlib affected by this FutureWarning under their quiver plot implementation: https://matplotlib.org/examples/pylab_examples/quiver_demo.html

在他们的箭袋图实现下受此 FutureWarning 影响的 Matplotlib 示例：https://matplotlib.org/examples/pylab_examples/quiver_demo.html

What's going on here?

这里发生了什么？

There is a disagreement between Numpy and native python on what should happen when you compare a strings to numpy's numeric types. Notice the left operand is python's turf, a primitive string, and the middle operation is python's turf, but the right operand is numpy's turf. Should you return a Python style Scalar or a Numpy style ndarray of boolean? Numpy says ndarray of bool, Pythonic developers disagree. Classic standoff.

关于将字符串与 numpy 的数字类型进行比较时应该发生什么，Numpy 和本机 python 之间存在分歧。注意左边的操作数是python的地盘，一个原始字符串，中间的操作是python的地盘，而右边的操作数是numpy的地盘。你应该返回一个 Python 风格的 Scalar 还是一个 Numpy 风格的 ndarray 布尔值？Numpy 说 ndarray of bool，Pythonic 开发人员不同意。经典对峙。

Should it be elementwise comparison or Scalar if item exists in the array?

如果项目存在于数组中，它应该是元素比较还是标量？

If your code or library is using the inor ==operators to compare python string to numpy ndarrays, they aren't compatible, so when if you try it, it returns a scalar, but only for now. The Warning indicates that in the future this behavior might change so your code pukes all over the carpet if python/numpy decide to do adopt Numpy style.

如果您的代码或库使用inor==运算符将 python 字符串与 numpy ndarrays 进行比较，则它们不兼容，因此当您尝试时，它会返回一个标量，但仅限于现在。警告表明，如果 python/numpy 决定采用 Numpy 风格，这种行为将来可能会改变，因此你的代码会在地毯上呕吐。

Submitted Bug reports:

提交的错误报告：

Numpy and Python are in a standoff, for now the operation returns a scalar, but in the future it may change.

Numpy 和 Python 处于对峙状态，目前该操作返回一个标量，但将来它可能会改变。

https://github.com/numpy/numpy/issues/6784

https://github.com/pandas-dev/pandas/issues/7830

Two workaround solutions:

两种解决方法：

Either lockdown your version of python and numpy, ignore the warnings and expect the behavior to not change, or convert both left and right operands of ==and into be from a numpy type or primitive python numeric type.

无论您锁定Python和numpy的版本，忽略这些警告并期望行为不改变，或转换的左侧和右侧的操作数==，并in从一个numpy的类型或原始数值蟒蛇类型。

Suppress the warning globally:

全局抑制警告：

import warnings
import numpy as np
warnings.simplefilter(action='ignore', category=FutureWarning)
print('x' in np.arange(5))   #returns False, without Warning

Suppress the warning on a line by line basis.

逐行禁止警告。

import warnings
import numpy as np

with warnings.catch_warnings():
    warnings.simplefilter(action='ignore', category=FutureWarning)
    print('x' in np.arange(2))   #returns False, warning is suppressed

print('x' in np.arange(10))   #returns False, Throws FutureWarning

Just suppress the warning by name, then put a loud comment next to it mentioning the current version of python and numpy, saying this code is brittle and requires these versions and put a link to here. Kick the can down the road.

只需按名称抑制警告，然后在它旁边放一个响亮的评论，提到 python 和 numpy 的当前版本，说这段代码很脆弱，需要这些版本，然后放一个链接到这里。把罐子踢到路上。

TLDR:pandasare Jedi; numpyare the hutts; and pythonis the galactic empire. https://youtu.be/OZczsiCfQQk?t=3

TLDR：pandas是绝地；numpy是小屋；并且python是银河帝国。 https://youtu.be/OZczsiCfQQk?t=3

Answer 2

回答by Dataman

I get the same error when I try to set the index_colreading a file into a Panda's data-frame:

当我尝试将index_col读取文件设置为 aPanda的数据框时，我遇到了同样的错误：

df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=['0'])  ## or same with the following
df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=[0])

I have never encountered such an error previously. I still am trying to figure out the reason behind this (using @Eric Leschinski explanation and others).

我以前从未遇到过这样的错误。我仍在试图找出这背后的原因（使用@Eric Leschinski 解释和其他解释）。

Anyhow, the following approach solves the problem for now until I figure the reason out:

无论如何，以下方法暂时解决了问题，直到我弄清楚原因：

df = pd.read_csv('my_file.tsv', sep='\t', header=0)  ## not setting the index_col
df.set_index(['0'], inplace=True)

I will update this as soon as I figure out the reason for such behavior.

我会在找出这种行为的原因后立即更新。

Answer 3

回答by yhd.leung

My experience to the same warning message was caused by TypeError.

我对相同警告消息的体验是由 TypeError 引起的。

TypeError: invalid type comparison

类型错误：无效的类型比较

So, you may want to check the data type of the Unnamed: 5

因此，您可能需要检查 Unnamed: 5

for x in df['Unnamed: 5']:
  print(type(x))  # are they 'str' ?

Here is how I can replicate the warning message:

这是我如何复制警告消息：

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 2), columns=['num1', 'num2'])
df['num3'] = 3
df.loc[df['num3'] == '3', 'num3'] = 4  # TypeError and the Warning
df.loc[df['num3'] == 3, 'num3'] = 4  # No Error

Hope it helps.

希望能帮助到你。

Answer 4

回答by Jeet23

A quick workaround for this is to use numpy.core.defchararray. I also faced the same warning message and was able to resolve it using above module.

对此的快速解决方法是使用numpy.core.defchararray. 我也遇到了同样的警告消息，并且能够使用上述模块解决它。

import numpy.core.defchararray as npd
resultdataset = npd.equal(dataset1, dataset2)

Answer 5

回答by Nathan

Eric's answer helpfully explains that the trouble comes from comparing a Pandas Series (containing a NumPy array) to a Python string. Unfortunately, his two workarounds both just suppress the warning.

Eric 的回答很有帮助地解释了问题来自将 Pandas 系列（包含一个 NumPy 数组）与 Python 字符串进行比较。不幸的是，他的两种解决方法都只是抑制了警告。

To write code that doesn't cause the warning in the first place, explicitly compare your string to each element of the Series and get a separate bool for each. For example, you could use mapand an anonymous function.

要编写首先不会导致警告的代码，请明确将您的字符串与系列的每个元素进行比较，并为每个元素获取一个单独的布尔值。例如，您可以使用map和匿名函数。

myRows = df[df['Unnamed: 5'].map( lambda x: x == 'Peter' )].index.tolist()

Answer 6

回答by EL_DON

If your arrays aren't too big or you don't have too many of them, you might be able to get away with forcing the left hand side of ==to be a string:

如果您的数组不是太大或者您没有太多数组，您可能可以通过强制左侧==为字符串而逃脱：

myRows = df[str(df['Unnamed: 5']) == 'Peter'].index.tolist()

But this is ~1.5 times slower if df['Unnamed: 5']is a string, 25-30 times slower if df['Unnamed: 5']is a small numpy array (length = 10), and 150-160 times slower if it's a numpy array with length 100 (times averaged over 500 trials).

但是，如果df['Unnamed: 5']是字符串，则慢约 1.5 倍，如果df['Unnamed: 5']是小 numpy 数组（长度 = 10），则慢 25-30 倍，如果是长度为 100 的 numpy 数组，则慢 150-160 倍（500 次试验的平均值） .

a = linspace(0, 5, 10)
b = linspace(0, 50, 100)
n = 500
string1 = 'Peter'
string2 = 'blargh'
times_a = zeros(n)
times_str_a = zeros(n)
times_s = zeros(n)
times_str_s = zeros(n)
times_b = zeros(n)
times_str_b = zeros(n)
for i in range(n):
    t0 = time.time()
    tmp1 = a == string1
    t1 = time.time()
    tmp2 = str(a) == string1
    t2 = time.time()
    tmp3 = string2 == string1
    t3 = time.time()
    tmp4 = str(string2) == string1
    t4 = time.time()
    tmp5 = b == string1
    t5 = time.time()
    tmp6 = str(b) == string1
    t6 = time.time()
    times_a[i] = t1 - t0
    times_str_a[i] = t2 - t1
    times_s[i] = t3 - t2
    times_str_s[i] = t4 - t3
    times_b[i] = t5 - t4
    times_str_b[i] = t6 - t5
print('Small array:')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_a), mean(times_str_a)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_a)/mean(times_a)))

print('\nBig array')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_b), mean(times_str_b)))
print(mean(times_str_b)/mean(times_b))

print('\nString')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_s), mean(times_str_s)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_s)/mean(times_s)))

Result:

结果：

Small array:
Time to compare without str conversion: 6.58464431763e-06 s. With str conversion: 0.000173756599426 s
Ratio of time with/without string conversion: 26.3881526541

Big array
Time to compare without str conversion: 5.44309616089e-06 s. With str conversion: 0.000870866775513 s
159.99474375821288

String
Time to compare without str conversion: 5.89370727539e-07 s. With str conversion: 8.30173492432e-07 s
Ratio of time with/without string conversion: 1.40857605178

Answer 7

回答by intotecho

I got this warning because I thought my column contained null strings, but on checking, it contained np.nan!

我收到此警告是因为我认为我的列包含空字符串，但经过检查，它包含 np.nan！

if df['column'] == '':

Changing my column to empty strings helped :)

将我的列更改为空字符串有帮助:)

Answer 8

回答by ahagen

I've compared a few of the methods possible for doing this, including pandas, several numpy methods, and a list comprehension method.

我已经比较了一些可能用于执行此操作的方法，包括 Pandas、几种 numpy 方法和列表理解方法。

First, let's start with a baseline:

首先，让我们从基线开始：

>>> import numpy as np
>>> import operator
>>> import pandas as pd

>>> x = [1, 2, 1, 2]
>>> %time count = np.sum(np.equal(1, x))
>>> print("Count {} using numpy equal with ints".format(count))
CPU times: user 52 μs, sys: 0 ns, total: 52 μs
Wall time: 56 μs
Count 2 using numpy equal with ints

So, our baseline is that the count should be correct 2, and we should take about 50 us.

所以，我们的基线是计数应该是正确的2，我们应该取大约50 us。

Now, we try the naive method:

现在，我们尝试朴素的方法：

>>> x = ['s', 'b', 's', 'b']
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 145 μs, sys: 24 μs, total: 169 μs
Wall time: 158 μs
Count NotImplemented using numpy equal
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  """Entry point for launching an IPython kernel.

And here, we get the wrong answer (NotImplemented != 2), it takes us a long time, and it throws the warning.

在这里，我们得到了错误的答案 ( NotImplemented != 2)，这需要我们很长时间，并抛出警告。

So we'll try another naive method:

所以我们将尝试另一种幼稚的方法：

>>> %time count = np.sum(x == 's')
>>> print("Count {} using ==".format(count))
CPU times: user 46 μs, sys: 1 μs, total: 47 μs
Wall time: 50.1 μs
Count 0 using ==

Again, the wrong answer (0 != 2). This is even more insidious because there's no subsequent warnings (0can be passed around just like 2).

再次，错误的答案（0 != 2）。这更加阴险，因为没有后续警告（0可以像一样传递2）。

Now, let's try a list comprehension:

现在，让我们尝试一个列表理解：

>>> %time count = np.sum([operator.eq(_x, 's') for _x in x])
>>> print("Count {} using list comprehension".format(count))
CPU times: user 55 μs, sys: 1 μs, total: 56 μs
Wall time: 60.3 μs
Count 2 using list comprehension

We get the right answer here, and it's pretty fast!

我们在这里得到了正确的答案，而且速度非常快！

Another possibility, pandas:

另一种可能，pandas：

>>> y = pd.Series(x)
>>> %time count = np.sum(y == 's')
>>> print("Count {} using pandas ==".format(count))
CPU times: user 453 μs, sys: 31 μs, total: 484 μs
Wall time: 463 μs
Count 2 using pandas ==

Slow, but correct!

慢，但正确！

And finally, the option I'm going to use: casting the numpyarray to the objecttype:

最后，我将使用的选项：将numpy数组转换为object类型：

>>> x = np.array(['s', 'b', 's', 'b']).astype(object)
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 50 μs, sys: 1 μs, total: 51 μs
Wall time: 55.1 μs
Count 2 using numpy equal

Fast and correct!

快速正确！

Answer 9

回答by ewizard

I had this code which was causing the error:

我有这个导致错误的代码：

for t in dfObj['time']:
  if type(t) == str:
    the_date = dateutil.parser.parse(t)
    loc_dt_int = int(the_date.timestamp())
    dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int

I changed it to this:

我把它改成这样：

for t in dfObj['time']:
  try:
    the_date = dateutil.parser.parse(t)
    loc_dt_int = int(the_date.timestamp())
    dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int
  except Exception as e:
    print(e)
    continue

to avoid the comparison, which is throwing the warning - as stated above. I only had to avoid the exception because of dfObj.locin the for loop, maybe there is a way to tell it not to check the rows it has already changed.

避免比较，这是抛出警告 - 如上所述。由于dfObj.loc在 for 循环中，我只需要避免异常，也许有一种方法可以告诉它不要检查它已经更改的行。

Python FutureWarning：元素比较失败；返回标量，但将来会执行元素比较

提问by Eric Leschinski

回答by Eric Leschinski

What's going on here?

这里发生了什么？

Submitted Bug reports:

提交的错误报告：

Two workaround solutions:

两种解决方法：

回答by Dataman

回答by yhd.leung

回答by Jeet23

回答by Nathan

回答by EL_DON

回答by intotecho

回答by ahagen

回答by ewizard

相关推荐

最近更新

标签

Python FutureWarning：元素比较失败；返回标量，但将来会执行元素比较

提问by Eric Leschinski

回答by Eric Leschinski

What's going on here?

这里发生了什么？

Submitted Bug reports:

提交的错误报告：

Two workaround solutions:

两种解决方法：

回答by Dataman

回答by yhd.leung

回答by Jeet23

回答by Nathan

回答by EL_DON

回答by intotecho

回答by ahagen

回答by ewizard

相关推荐

使用 Python 的随机森林特征重要性图表

Python Tensorflow Assign 需要两个张量的形状匹配。lhs 形状= [20] rhs 形状= [48]

Python 导入 pandas.plotting 的问题

在 Python 中将列表转换为字符串

相关推荐

最近更新

标签