Numpy 与直接 python 不同的问题？

Question

提问by denis

Folks,

各位，

is there a collection of gotchas where Numpy differs from python, points that have puzzled and cost time ?

是否有一些 Numpy 与 python 不同的问题、困惑和花费时间的点？

"The horror of that moment I shall never never forget !"
"You will, though," the Queen said, "if you don't make a memorandum of it."

“那一刻的恐怖我永远不会忘记！”
“不过，你会的，”女王说，“如果你不做备忘录的话。”

For example, NaNs are always trouble, anywhere. If you can explain this without running it, give yourself a point --

例如，NaN 在任何地方总是很麻烦。如果你不用运行就可以解释这一点，给自己一个观点——

from numpy import array, NaN, isnan

pynan = float("nan")
print pynan is pynan, pynan is NaN, NaN is NaN
a = (0, pynan)
print a, a[1] is pynan, any([aa is pynan for aa in a])

a = array(( 0, NaN ))
print a, a[1] is NaN, isnan( a[1] )

(I'm not knocking numpy, lots of good work there, just think a FAQ or Wiki of gotchas would be useful.)

（我不是在敲麻木，那里有很多好的工作，只是认为常见问题解答或陷阱的 Wiki 会很有用。）

Edit: I was hoping to collect half a dozen gotchas (surprises for people learning Numpy).
Then, if there are common gotchas or, better, common explanations, we could talk about adding them to a community Wiki (where ?) It doesn't look like we have enough so far.

编辑：我希望收集六个陷阱（学习 Numpy 的人的惊喜）。
然后，如果有共同的问题，或者更好的共同解释，我们可以讨论将它们添加到社区 Wiki（在哪里？）到目前为止，我们似乎还不够。

Answer 1

采纳答案by Christian Oudard

The biggest gotcha for me was that almost every standard operator is overloaded to distribute across the array.

对我来说最大的问题是几乎每个标准运算符都被重载以分布在整个阵列中。

Define a list and an array

定义一个列表和一个数组

>>> l = range(10)
>>> l
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> import numpy
>>> a = numpy.array(l)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Multiplication duplicates the python list, but distributes over the numpy array

乘法复制 python 列表，但分布在 numpy 数组上

>>> l * 2
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> a * 2
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

Addition and division are not defined on python lists

python 列表中没有定义加法和除法

>>> l + 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate list (not "int") to list
>>> a + 2
array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11])
>>> l / 2.0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for /: 'list' and 'float'
>>> a / 2.0
array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5])

Numpy overloads to treat lists like arrays sometimes

Numpy 重载有时将列表视为数组

>>> a + a
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
>>> a + l
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

Answer 2

回答by Nathan Wilcox

Because __eq__does not return a bool, using numpy arrays in any kind of containers prevents equality testing without a container-specific work around.

因为__eq__不返回 bool，所以在任何类型的容器中使用 numpy 数组都可以防止在没有特定于容器的工作的情况下进行相等性测试。

Example:

例子：

>>> import numpy
>>> a = numpy.array(range(3))
>>> b = numpy.array(range(3))
>>> a == b
array([ True,  True,  True], dtype=bool)
>>> x = (a, 'banana')
>>> y = (b, 'banana')
>>> x == y
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

This is a horrible problem. For example, you cannot write unittests for containers which use TestCase.assertEqual()and must instead write custom comparison functions. Suppose we write a work-around function special_eq_for_numpy_and_tuples. Now we can do this in a unittest:

这是一个可怕的问题。例如，您不能为使用TestCase.assertEqual()而必须编写自定义比较函数的容器编写单元测试。假设我们编写了一个变通函数special_eq_for_numpy_and_tuples。现在我们可以在单元测试中做到这一点：

x = (array1, 'deserialized')
y = (array2, 'deserialized')
self.failUnless( special_eq_for_numpy_and_tuples(x, y) )

Now we must do this for every container type we might use to store numpy arrays. Furthermore, __eq__might return a bool rather than an array of bools:

现在，我们必须对可能用于存储 numpy 数组的每种容器类型执行此操作。此外，__eq__可能会返回一个 bool 而不是一个 bool 数组：

>>> a = numpy.array(range(3))
>>> b = numpy.array(range(5))
>>> a == b
False

Now each of our container-specific equality comparison functions must also handle that special case.

现在我们每个特定于容器的相等比较函数也必须处理这种特殊情况。

Maybe we can patch over this wart with a subclass?

也许我们可以用子类修补这个疣？

>>> class SaneEqualityArray (numpy.ndarray):
...   def __eq__(self, other):
...     return isinstance(other, SaneEqualityArray) and self.shape == other.shape and (numpy.ndarray.__eq__(self, other)).all()
... 
>>> a = SaneEqualityArray( (2, 3) )
>>> a.fill(7)
>>> b = SaneEqualityArray( (2, 3) )
>>> b.fill(7)
>>> a == b
True
>>> x = (a, 'banana')
>>> y = (b, 'banana')
>>> x == y
True
>>> c = SaneEqualityArray( (7, 7) )
>>> c.fill(7)
>>> a == c
False

That seems to do the right thing. The class should also explicitly export elementwise comparison, since that is often useful.

这似乎做对了。该类还应该显式导出元素比较，因为这通常很有用。

Answer 3

回答by nikow

I think this one is funny:

我觉得这个很有趣：

>>> import numpy as n
>>> a = n.array([[1,2],[3,4]])
>>> a[1], a[0] = a[0], a[1]
>>> a
array([[1, 2],
       [1, 2]])

For Python lists on the other hand this works as intended:

另一方面，对于 Python 列表，这按预期工作：

>>> b = [[1,2],[3,4]]
>>> b[1], b[0] = b[0], b[1]
>>> b
[[3, 4], [1, 2]]

Funny side note: numpy itself had a bug in the shufflefunction, because it used that notation :-) (see here).

有趣的旁注：numpy 本身在shuffle函数中存在一个错误，因为它使用了该符号:-)（请参阅此处）。

The reason is that in the first case we are dealing with viewsof the array, so the values are overwritten in-place.

原因是在第一种情况下，我们正在处理数组的视图，因此值被就地覆盖。

Answer 4

回答by Ants Aasma

NaNis not a singleton like None, so you can't really use the is check on it. What makes it a bit tricky is that NaN == NaNis Falseas IEEE-754 requires. That's why you need to use the numpy.isnan()function to check if a float is not a number. Or the standard library math.isnan()if you're using Python 2.6+.

NaN不是像一样的单例None，所以你不能真正使用 is 检查它。让它有点棘手的是，这NaN == NaN是FalseIEEE-754 所要求的。这就是为什么您需要使用该numpy.isnan()函数来检查浮点数是否不是数字的原因。或者标准库，math.isnan()如果您使用的是 Python 2.6+。

Answer 5

回答by Roberto Bonvallet

Slicing creates views, not copies.

切片创建视图，而不是副本。

>>> l = [1, 2, 3, 4]
>>> s = l[2:3]
>>> s[0] = 5
>>> l
[1, 2, 3, 4]

>>> a = array([1, 2, 3, 4])
>>> s = a[2:3]
>>> s[0] = 5
>>> a
array([1, 2, 5, 4])

Answer 6

回答by endolith

In [1]: bool([])
Out[1]: False

In [2]: bool(array([]))
Out[2]: False

In [3]: bool([0])
Out[3]: True

In [4]: bool(array([0]))
Out[4]: False

So don't test for the emptiness of an array by checking its truth value. Use size(array()).

所以不要通过检查数组的真值来测试数组是否为空。使用size(array()).

And don't use len(array()), either:

也不要使用len(array())：

In [1]: size(array([]))
Out[1]: 0

In [2]: len(array([]))
Out[2]: 0

In [3]: size(array([0]))
Out[3]: 1

In [4]: len(array([0]))
Out[4]: 1

In [5]: size(array(0))
Out[5]: 1

In [6]: len(array(0))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-5b2872696128> in <module>()
----> 1 len(array(0))

TypeError: len() of unsized object

Answer 7

回答by nmb

The truth value of a Numpy array differs from that of a python sequence type, where any non-empty sequence is true.

Numpy 数组的真值与 python 序列类型的真值不同，其中任何非空序列都为真。

>>> import numpy as np
>>> l = [0,1,2,3]
>>> a = np.arange(4)
>>> if l: print "Im true"
... 
Im true
>>> if a: print "Im true"
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is ambiguous. Use
a.any() or a.all()
>>>

The numerical types are true when they are non-zero and as a collection of numbers, the numpy array inherits this definition. But with a collection of numbers, truth could reasonably mean "all elements are non-zero" or "at least one element is non-zero". Numpy refuses to guess which definition is meant and raises the above exception. Using the .any()and .all()methods allows one to specify which meaning of true is meant.

数字类型在非零时为真，并且作为数字的集合，numpy 数组继承了这个定义。但是对于一组数字，真理可以合理地意味着“所有元素都非零”或“至少一个元素非零”。Numpy 拒绝猜测哪个定义的含义并引发了上述异常。使用.any()和.all()方法可以指定 true 的含义。

>>> if a.any(): print "Im true"
... 
Im true
>>> if a.all(): print "Im true"
... 
>>>

Answer 8

回答by Radim

(Related, but a NumPy vs. SciPy gotcha, rather than NumPy vs Python)

（相关，但 NumPy 与 SciPy 的问题，而不是 NumPy 与 Python）

Slicing beyond an array's real size works differently:

超出数组实际大小的切片工作方式不同：

>>> import numpy, scipy.sparse

>>> m = numpy.random.rand(2, 5) # create a 2x5 dense matrix
>>> print m[:3, :] # works like list slicing in Python: clips to real size
[[ 0.12245393  0.20642799  0.98128601  0.06102106  0.74091038]
[ 0.0527411   0.9131837   0.6475907   0.27900378  0.22396443]]

>>> s = scipy.sparse.lil_matrix(m) # same for csr_matrix and other sparse formats
>>> print s[:3, :] # doesn't clip!
IndexError: row index out of bounds

So when slicing scipy.sparsearrays, you must make manually sure your slice bounds are within range. This differs from how both NumPy and plain Python work.

因此，在切片scipy.sparse数组时，您必须手动确保切片边界在范围内。这与 NumPy 和普通 Python 的工作方式不同。

Answer 9

回答by DSM

No one seems to have mentioned this so far:

到目前为止似乎没有人提到这一点：

>>> all(False for i in range(3))
False
>>> from numpy import all
>>> all(False for i in range(3))
True
>>> any(False for i in range(3))
False
>>> from numpy import any
>>> any(False for i in range(3))
True

numpy's anyand alldon't play nicely with generators, and don't raise any error warning you that they don't.

numpy'sany并且all不能很好地与生成器一起使用，并且不要引发任何错误警告您他们没有。

Answer 10

回答by Lennart Regebro

print pynan is pynan, pynan is NaN, NaN is NaN

This tests identity, that is if it is the same object. The result should therefore obviously be True, False, True, because when you do float(whatever) you are creating a new float object.

这将测试身份，即是否为同一对象。因此，结果显然应该是 True、False、True，因为当您执行 float(whatever) 操作时，您正在创建一个新的 float 对象。

a = (0, pynan)
print a, a[1] is pynan, any([aa is pynan for aa in a])

I don't know what it is that you find surprising with this.

我不知道你对此感到惊讶的是什么。

a = array(( 0, NaN ))
print a, a[1] is NaN, isnan( a[1] )

This I did have to run. :-) When you stick NaN into an array it's converted into a numpy.float64 object, which is why a[1] is NaN fails.

这是我必须运行的。:-) 当您将 NaN 放入数组时，它会转换为 numpy.float64 对象，这就是 a[1] 是 NaN 失败的原因。

This all seems fairly unsurprising to me. But then I don't really know anything much about NumPy. :-)

这一切对我来说似乎并不奇怪。但后来我对 NumPy 不太了解。:-)

Numpy 与直接 python 不同的问题？

提问by denis

采纳答案by Christian Oudard

回答by Nathan Wilcox

回答by nikow

回答by Ants Aasma

回答by Roberto Bonvallet

回答by endolith

回答by nmb

回答by Radim

回答by DSM

回答by Lennart Regebro

相关推荐

最近更新

标签

Numpy 与直接 python 不同的问题？

提问by denis

采纳答案by Christian Oudard

回答by Nathan Wilcox

回答by nikow

回答by Ants Aasma

回答by Roberto Bonvallet

回答by endolith

回答by nmb

回答by Radim

回答by DSM

回答by Lennart Regebro

相关推荐

python 'datetime.time' 没有 'mktime'

Python urllib2 URLError 异常？

可迭代的 Python for 循环如何工作？（`用于 feed.entry 中的派对`）

python 如何打开文件并找到一行的最长长度然后将其打印出来

相关推荐

最近更新

标签