Python 创建 numpy 数组时 dtype=object 是什么意思?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29877508/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 05:07:22  来源:igfitidea点击:

What does dtype=object mean while creating a numpy array?

pythonarraysnumpytypes

提问by Avinash Pandey

I was experimenting with numpy arrays and created a numpy array of strings:

我正在试验 numpy 数组并创建了一个 numpy 字符串数组:

ar1 = np.array(['avinash', 'jay'])

As I have read from from their official guide, operations on numpy array are propagated to individual elements. So I did this:

正如我从他们的官方指南中读到的那样,对 numpy 数组的操作会传播到各个元素。所以我这样做了:

ar1 * 2

But then I get this error:

但是后来我收到了这个错误:

TypeError                                 Traceback (most recent call last)
<ipython-input-22-aaac6331c572> in <module>()
----> 1 ar1 * 2

TypeError: unsupported operand type(s) for *: 'numpy.ndarray' and 'int'

But when I used dtype=object

但是当我使用 dtype=object

ar1 = np.array(['avinash', 'jay'], dtype=object)

while creating the array I am able to do all operations.

在创建数组时,我可以执行所有操作。

Can anyone tell me why this is happening?

谁能告诉我为什么会这样?

采纳答案by Alex Riley

NumPy arrays are stored as contiguous blocks of memory. They usually have a single datatype (e.g. integers, floats or fixed-length strings) and then the bits in memory are interpreted as values with that datatype.

NumPy 数组存储为连续的内存块。它们通常具有单一数据类型(例如整数、浮点数或固定长度的字符串),然后内存中的位被解释为具有该数据类型的值。

Creating an array with dtype=objectis different. The memory taken by the array now is filled with pointersto Python objects which are being stored elsewherein memory (much like a Python listis really just a list of pointers to objects, not the objects themselves).

创建一个数组dtype=object是不同的。数组占用的内存现在充满了指向存储在内存中其他地方的Python 对象的指针(就像 Pythonlist实际上只是指向对象的指针列表,而不是对象本身)。

Arithmetic operators such as *don't work with arrays such as ar1which have a string_datatype (there are special functions instead - see below). NumPy is just treating the bits in memory as characters and the *operator doesn't make sense here. However, the line

算术运算符(例如)*不适ar1用于具有string_数据类型的数组(有特殊功能 - 请参见下文)。NumPy 只是将内存中的位视为字符,*运算符在这里没有意义。然而,该行

np.array(['avinash','jay'], dtype=object) * 2

works because now the array is an array of (pointers to) Python strings. The *operator is well defined for these Python string objects. New Python strings are created in memory and a new objectarray with references to the new strings is returned.

之所以有效,是因为现在该数组是一个(指向)Python 字符串的数组。*为这些 Python 字符串对象定义了良好的运算符。在内存中创建新的 Python 字符串,并返回一个object包含对新字符串的引用的新数组。



If you have an array with string_or unicode_dtype and want to repeat each string, you can use np.char.multiply:

如果您有一个带有string_unicode_dtype的数组并且想要重复每个字符串,则可以使用np.char.multiply

In [52]: np.char.multiply(ar1, 2)
Out[52]: array(['avinashavinash', 'jayjay'], 
      dtype='<U14')

NumPy has many other vectorised string methodstoo.

NumPy 也有许多其他向量化字符串方法