Python 混合类型的 NumPy 数组/矩阵

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24832715/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 05:18:20  来源:igfitidea点击:

NumPy array/matrix of mixed types

pythonarraysnumpymatrix

提问by Vit D

I'm trying to create a NumPy array/matrix (Nx3) with mixed data types (string, integer, integer). But when I'm appending this matrix by adding some data, I get an error: TypeError: invalid type promotion. Please, can anybody help me to solve this problem?

我正在尝试使用混合数据类型(字符串、整数、整数)创建一个 NumPy 数组/矩阵(Nx3)。但是当我通过添加一些数据来附加这个矩阵时,我收到一个错误:TypeError: invalid type Promotion。请问有人能帮我解决这个问题吗?

When I create an array with the sample data, NumPy casts all columns in the matrix to the one 'S' data type. And I can't specify data type for an array, because when i do this res = np.array(["TEXT", 1, 1], dtype='S, i4, i4')- I get an error: TypeError: expected a readable buffer object

当我使用示例数据创建一个数组时,NumPy 将矩阵中的所有列转换为一个“S”数据类型。而且我无法为数组指定数据类型,因为当我这样做时res = np.array(["TEXT", 1, 1], dtype='S, i4, i4')- 我得到一个错误:TypeError : 期望一个可读的缓冲区对象

templates.py

模板.py

import numpy as np
from pprint import pprint

test_array = np.zeros((0, 3), dtype='S, i4, i4')
pprint(test_array)

test_array = np.append(test_array, [["TEXT", 1, 1]], axis=0)
pprint(test_array)

print("Array example:")
res = np.array(["TEXT", 1, 1])
pprint(res)

Output:

输出:

array([], shape=(0L, 3L), 
  dtype=[('f0', 'S'), ('f1', '<i4'), ('f2', '<i4')])

 Array example:
 array(['TEXT', '1', '1'], dtype='|S4')

Error:

错误:

Traceback (most recent call last):

File "templates.py", line 5, in <module>
test_array = np.append(test_array, [["TEXT", 1, 1]], axis=0)

File "lib\site-packages\numpy\lib\function_base.py", line 3543, in append
return concatenate((arr, values), axis=axis)

TypeError: invalid type promotion

采纳答案by DrV

Your problem is in the data. Try this:

你的问题出在数据上。尝试这个:

res = np.array(("TEXT", 1, 1), dtype='|S4, i4, i4')

or

或者

res = np.array([("TEXT", 1, 1), ("XXX", 2, 2)], dtype='|S4, i4, i4')

The data has to be a tuple or a list of tuples. Not quite evident form the error message, is it?

数据必须是元组或元组列表。从错误消息中不太明显,是吗?

Also, please note that the length of the text field has to be specified for the text data to really be saved. If you want to save the text as objects (only references in the array, then:

另外,请注意,必须指定文本字段的长度才能真正保存文本数据。如果要将文本另存为对象(仅在数组中引用,则:

res = np.array([("TEXT", 1, 1), ("XXX", 2, 2)], dtype='object, i4, i4')

This is often quite useful, as well.

这通常也非常有用。

回答by Sam M.

I don't believe you can make an array out of more than one data type. You can, however, make a list with more than one data type.

我不相信你可以用一种以上的数据类型来制作一个数组。但是,您可以创建包含多个数据类型的列表。

list = ["TEXT", 1, 1]
print(list)

gives

['TEXT', 1, 1]

回答by Frank M

First, numpy stores array elements using fixed physical record sizes. So, record objects need to all be the same physical size. For this reason, you need to tell numpy the size of the string or save a pointer to a string stored somewhere else. In a record array, 'S' translates into a zero-length string, and that's probably not what you intended.

首先,numpy 使用固定的物理记录大小存储数组元素。因此,记录对象都需要具有相同的物理大小。出于这个原因,您需要告诉 numpy 字符串的大小或保存指向存储在其他地方的字符串的指针。在记录数组中,'S' 转换为长度为零的字符串,这可能不是您想要的。

The append method actually copies the entire array to a larger physical space to accommodate the new elements. Try, for example:

append 方法实际上是将整个数组复制到更大的物理空间以容纳新元素。尝试,例如:

import numpy as np
mtype = 'S10, i4, i4'
ta = np.zeros((0), dtype=mtype)
print id(ta)
ta = np.append(ta, np.array([('first', 10, 11)], dtype=mtype))
print id(ta)
ta = np.append(ta, np.array([('second', 20, 21)], dtype=mtype))
print id(ta)

Each time you append this way, the copy gets slower because you need to allocate and copy more memory each time it grows. That's why the id returns a different value every time you append. If you want any significant number of records in your array, you are much better off either allocating enough space from the start, or else accumulating the data in lists and then collecting the lists into a numpy structured array when you're done. That also gives you the opportunity to make the string length in mtype as short as possible, while still long enough to hold your longest string.

每次以这种方式追加时,复制都会变慢,因为每次增长时都需要分配和复制更多内存。这就是为什么每次追加时 id 都会返回不同的值。如果您想要数组中的任何大量记录,最好从一开始就分配足够的空间,或者在列表中累积数据,然后在完成后将列表收集到一个 numpy 结构化数组中。这也使您有机会使 mtype 中的字符串长度尽可能短,同时仍然足够长以容纳最长的字符串。

回答by sirlark

If you're not married to numpy, a pandas DataFrameis perfect for this. Alternatively, you can specify the string field in the array as a python object (dtype='O, i4, i4' as an example). Also append seem to like lists of tuples, not lists of lists. I think it has something to do with mutability of lists, not sure.

如果您不喜欢 numpy,那么Pandas DataFrame是完美的选择。或者,您可以将数组中的字符串字段指定为 python 对象(以 dtype='O, i4, i4' 为例)。另外 append 似乎喜欢元组列表,而不是列表列表。我认为这与列表的可变性有关,不确定。

回答by hpaulj

I think this is what you are trying to accomplish - create an empty array of the desired dtype, and then add one or more data sets to it. The result will have shape (N,), not (N,3).

我认为这就是您要完成的任务 - 创建一个所需的空数组dtype,然后向其中添加一个或多个数据集。结果将具有形状 (N,),而不是 (N,3)。

As I noted in a comment, np.appenduses np.concatenate, so I am using that too. Also I have to make both test_arrayand x1d arrays (shape (0,) and (1,) respectively). And the dtypefield is S10, large enough to contain 'TEXT'.

正如我在评论中指出的,np.append使用np.concatenate,所以我也在使用它。此外,我必须同时制作test_array和一x维数组(形状分别为 (0,) 和 (1,))。并且该dtype字段S10足够大以包含“TEXT”。

In [56]: test_array = np.zeros((0,), dtype='S10, i4, i4')

In [57]: x = np.array([("TEST",1,1)], dtype='S10, i4, i4')

In [58]: test_array = np.concatenate((test_array, x))

In [59]: test_array = np.concatenate((test_array, x))

In [60]: test_array
Out[60]: 
array([('TEST', 1, 1), ('TEST', 1, 1)], 
      dtype=[('f0', 'S'), ('f1', '<i4'), ('f2', '<i4')])

Here's an example of building the array from a list of tuples:

这是从元组列表构建数组的示例:

In [75]: xl=('test',1,1)

In [76]: np.array([xl]*3,dtype='S10,i4,i4')
Out[76]: 
array([('test', 1, 1), ('test', 1, 1), ('test', 1, 1)], 
      dtype=[('f0', 'S10'), ('f1', '<i4'), ('f2', '<i4')])