Python 本机 int 类型和 numpy.int 类型有什么区别?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38155039/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:26:22  来源:igfitidea点击:

What is the difference between native int type and the numpy.int types?

pythonnumpy

提问by Aguy

Can you please help understand what are the main differences (if any) between the native int type and the numpy.int32 or numpy.int64 types?

您能否帮助理解本机 int 类型与 numpy.int32 或 numpy.int64 类型之间的主要区别(如果有)?

采纳答案by hpaulj

Another way to look at the differences is to ask what methods do the 2 kinds of objects have.

另一种看待差异的方法是问这两种对象有什么方法。

In Ipython I can use tab complete to look at methods:

在 Ipython 中,我可以使用 tab complete 来查看方法:

In [1277]: x=123; y=np.int32(123)

intmethods and attributes:

int方法和属性:

In [1278]: x.<tab>
x.bit_length   x.denominator  x.imag         x.numerator    x.to_bytes
x.conjugate    x.from_bytes   x.real         

int'operators'

int“运营商”

In [1278]: x.__<tab>
x.__abs__           x.__init__          x.__rlshift__
x.__add__           x.__int__           x.__rmod__
x.__and__           x.__invert__        x.__rmul__
x.__bool__          x.__le__            x.__ror__
...
x.__gt__            x.__reduce_ex__     x.__xor__
x.__hash__          x.__repr__          
x.__index__         x.__rfloordiv__     

np.int32methods and attributes (or properties). Some of the same, but a lot more, basically all the ndarrayones:

np.int32方法和属性(或属性)。有些相同,但更多,基本上都是ndarray

In [1278]: y.<tab>
y.T             y.denominator   y.ndim          y.size
y.all           y.diagonal      y.newbyteorder  y.sort
y.any           y.dtype         y.nonzero       y.squeeze   
...
y.cumsum        y.min           y.setflags      
y.data          y.nbytes        y.shape   

the y.__methods look a lot like the intones. They can do the same math.

这些y.__方法看起来很像int那些方法。他们可以做同样的数学。

In [1278]: y.__<tab>
y.__abs__              y.__getitem__          y.__reduce_ex__
y.__add__              y.__gt__               y.__repr__
...
y.__format__           y.__rand__             y.__subclasshook__
y.__ge__               y.__rdivmod__          y.__truediv__
y.__getattribute__     y.__reduce__           y.__xor__

yis in many ways the same as a 0d array. Not identical, but close.

y在许多方面与 0d 数组相同。不完全相同,但接近。

In [1281]: z=np.array(123,dtype=np.int32)

np.int32is what I get when I index an array of that type:

np.int32当我索引该类型的数组时,我得到的是:

In [1300]: A=np.array([0,123,3])

In [1301]: A[1]
Out[1301]: 123

In [1302]: type(A[1])
Out[1302]: numpy.int32

I have to use itemto remove all of the numpywrapping.

我必须用它item来去除所有的numpy包装。

In [1303]: type(A[1].item())
Out[1303]: int

As a numpyuser, an np.int32is an intwith a numpywrapper. Or conversely a single element of an ndarray. Usually I don't pay attention as to whether A[0]is giving me the 'native' intor the numpy equivalent. In contrast to some new users, I rarely use np.int32(123); I would use np.array(123)instead.

作为numpy用户, annp.int32int带有numpy包装器的。或者相反的一个ndarray. 通常我不会注意A[0]是给我“本机”int还是 numpy 等价物。与一些新用户相比,我很少使用np.int32(123); 我会用np.array(123)

A = np.array([1,123,0], np.int32)

does not contain 3 np.int32objects. Rather its data buffer is 3*4=12 bytes long. It's the array overhead that interprets it as 3 ints in a 1d. And viewshows me the same databuffer with different interpretations:

不包含 3 个np.int32对象。相反,它的数据缓冲区是 3*4=12 字节长。数组开销将其解释为 1d 中的 3 个整数。并view显示我不同的解释相同的DataBuffer:

In [1307]: A.view(np.int16)
Out[1307]: array([  1,   0, 123,   0,   0,   0], dtype=int16)

In [1310]: A.view('S4')
Out[1310]: array([b'\x01', b'{', b''],   dtype='|S4')

It's only when I index a single element that I get a np.int32object.

只有当我索引单个元素时,我才会得到一个np.int32对象。

The list L=[1, 123, 0]is different; it's a list of pointers - pointers to intobjects else where in memory. Similarly for a dtype=object array.

名单L=[1, 123, 0]不同;它是一个指针列表 - 指向int内存中其他对象的指针。同样对于 dtype=object 数组。

回答by TheBlackCat

There are several major differences. The first is that python integers are flexible-sized (at least in python 3.x). This means they can grow to accommodate any number of any size (within memory constraints, of course). The numpy integers, on the other hand, are fixed-sized. This means there is a maximum value they can hold. This is defined by the number of bytes in the integer (int32vs. int64), with more bytes holding larger numbers, as well as whether the number is signed or unsigned (int32vs. uint32), with unsigned being able to hold larger numbers but not able to hold negative number.

有几个主要区别。首先是python整数是灵活大小的(至少在python 3.x中)。这意味着它们可以增长以容纳任意数量的任意大小(当然在内存限制内)。另一方面,numpy 整数是固定大小的。这意味着他们可以持有一个最大值。这是由整数中的字节数(int32vs. int64)定义的,更多的字节保存更大的数字,以及该数字是有符号还是无符号(int32vs. uint32),无符号能够保存更大的数字但不能持有负数。

So, you might ask, why use the fixed-sized integers? The reason is that modern processors have built-in tools for doing math on fixed-size integers, so calculations on those are much, much, much faster. In fact, python uses fixed-sized integers behind-the-scenes when the number is small enough, only switching to the slower, flexible-sized integers when the number gets too large.

那么,您可能会问,为什么要使用固定大小的整数?原因是现代处理器具有用于对固定大小的整数进行数学运算的内置工具,因此对这些整数的计算要快得多。事实上,当数字足够小时,python 在后台使用固定大小的整数,只有在数字太大时才切换到速度较慢、大小灵活的整数。

Another advantage of fixed-sized values is that they can be placed into consistently-sized adjacent memory blocks of the same type. This is the format that numpy arrays use to store data. The libraries that numpy relies on are able to do extremely fast computations on data in this format, in fact modern CPUs have built-in features for accelerating this sort of computation. With the variable-sized python integers, this sort of computation is impossible because there is no way to say how big the blocks should be and no consistentcy in the data format.

固定大小值的另一个优点是它们可以放置到相同类型的大小一致的相邻内存块中。这是 numpy 数组用于存储数据的格式。numpy 依赖的库能够对这种格式的数据进行极快的计算,实际上现代 CPU 具有用于加速此类计算的内置功能。使用可变大小的 python 整数,这种计算是不可能的,因为无法说明块应该有多大,并且数据格式也没有一致性。

That being said, numpy is actually able to make arrays of python integers. But rather than arrays containing the values, instead they are arrays containing references to other pieces of memory holding the actual python integers. This cannot be accelerated in the same way, so even if all the python integers fit within the fixed integer size, it still won't be accelerated.

话虽如此,numpy 实际上能够创建 Python 整数数组。但不是包含值的数组,而是包含对其他保存实际 python 整数的内存的引用的数组。这不能以相同的方式加速,因此即使所有 python 整数都适合固定的整数大小,它仍然不会被加速。

None of this is the case with Python 2. In Python 2, Python integers are fixed integers and thus can be directly translated into numpy integers. For variable-length integers, Python 2 had the longtype. But this was confusing and it was decided this confusion wasn't worth the performance gains, especially when people who need performance would be using numpy or something like it anyway.

Python 2 并非如此。在 Python 2 中,Python 整数是固定整数,因此可以直接转换为 numpy 整数。对于变长整数,Python 2 具有该long类型。但这令人困惑,并且决定这种混乱不值得性能提升,尤其是当需要性能的人无论如何都会使用 numpy 或类似的东西时。

回答by mgilson

I think that the biggest difference is that the numpy types are compatible with their C counterparts. For one thing, this means that numpy ints can overflow...

我认为最大的区别在于 numpy 类型与其 C 对应物兼容。一方面,这意味着 numpy int 可以溢出......

>>> np.int32(2**32)
0

This is why you can create an array of integers and specify the datatype as np.int32for example. Numpy will then allocate an array that is large enough to hold the specified number of 32 bit integers and then when you need the values, it'll convert the C-integers to np.int32(which is very quick). The benefits of being able to convert back and forth from np.int32and a C-int also include huge memory savings. Python objects are generally pretty big:

这就是为什么您可以创建一个整数数组并指定数据类型的原因np.int32,例如。然后 Numpy 将分配一个足够大的数组以容纳指定数量的 32 位整数,然后当您需要这些值时,它会将 C 整数转换为np.int32(非常快)。能够在np.int32C-int 和 C-int 之间来回转换的好处还包括节省大量内存。Python 对象通常非常大:

>>> sys.getsizeof(1)
24

A np.int32isn't any smaller:

Anp.int32并不小:

>>> sys.getsizeof(np.int32(1))
28

but remember, most of the time when we're working with numpy arrays, we're only working with the C integers which only take 4 bytes (instead of 24). We only need to work with the np.int32when dealing with scalar values from an array.

但请记住,大多数情况下,当我们使用 numpy 数组时,我们只使用仅占用 4 个字节(而不是 24 个字节)的 C 整数。我们只需要np.int32在处理数组中的标量值时使用 。