Python 如何在不改变其维度的情况下将名称添加到 numpy 数组?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24168569/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 04:07:56  来源:igfitidea点击:

How to add names to a numpy array without changing its dimension?

pythonarraysnumpy

提问by Josh O'Brien

I have an existing two-column numpy array to which I need to add column names. Passing those in via dtypeworks in the toy example shown in Block 1below. With my actual array, though, as shown in Block 2, the same approach is having an unexpected (to me!) side-effect of changing the array dimensions.

我有一个现有的两列 numpy 数组,我需要向其中添加列名。在下面的块 1dtype显示的玩具示例中传递那些通过工作。但是,对于我的实际数组,如Block 2所示,相同的方法会产生意外(对我而言!)更改数组维度的副作用。

How can I convert my actual array, the one named Yin the second block below, to an array having named columns, like I did for array Ain the first block?

如何将我的实际数组(在Y下面第二个块中命名的数组)转换为具有命名列的数组,就像我A在第一个块中为数组所做的那样?

Block 1:(Columns of Anamed without reshaping dimension)

第 1 块:(A未重塑维度的已命名列)

import numpy as np
A = np.array(((1,2),(3,4),(50,100)))
A
# array([[  1,   2],
#        [  3,   4],
#        [ 50, 100]])
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
A.dtype=dt
A
# array([[(1, 2)],
#        [(3, 4)],
#        [(50, 100)]], 
#       dtype=[('ID', '<i4'), ('Ring', '<i4')])

Block 2:(Naming columns of my actual array, Y, reshapes its dimension)

第 2 块:(命名我的实际数组的列Y,重塑其维度)

import numpy as np
## Code to reproduce Y, the array I'm actually dealing with
nRings = 3
nn = [[nRings+1-n] * n for n in range(nRings+1)]
RING = reduce(lambda x, y: x+y, nn)
ID = range(1,len(RING)+1)
X = numpy.array([ID, RING])
Y = X.T
Y
# array([[1, 3],
#        [2, 2],
#        [3, 2],
#        [4, 1],
#        [5, 1],
#        [6, 1]])

## My unsuccessful attempt to add names to the array's columns    
dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
Y.dtype=dt
Y
# array([[(1, 2), (3, 2)],
#        [(3, 4), (2, 1)],
#        [(5, 6), (1, 1)]], 
#       dtype=[('ID', '<i4'), ('Ring', '<i4')])

## What I'd like instead of the results shown just above
# array([[(1, 3)],
#        [(2, 2)],
#        [(3, 2)],
#        [(4, 1)],
#        [(5, 1)],
#        [(6, 1)]],
#       dtype=[('ID', '<i4'), ('Ring', '<i4')])

采纳答案by Bi Rico

First because your question asks about giving names to arrays, I feel obligated to point out that using "structured arrays" for the purpose of giving names is probably not the best approach. We often like to give names to rows/columns when we're working with tables, if this is the case I suggest you try something like pandaswhich is awesome. If you simply want to organize some data in your code, a dictionary of arrays is often much better than a structured array, so for example you can do:

首先,因为您的问题是关于为数组命名,我觉得有必要指出,使用“结构化数组”来命名可能不是最好的方法。当我们处理表格时,我们经常喜欢给行/列命名,如果是这种情况,我建议你尝试像Pandas这样很棒的东西。如果你只是想在你的代码中组织一些数据,数组字典通常比结构化数组好得多,例如你可以这样做:

Y = {'ID':X[0], 'Ring':X[1]}

With that out of the way, if you want to use a structured array, here is the clearest way to do it in my opinion:

顺便说一句,如果你想使用结构化数组,我认为这是最清晰的方法:

import numpy as np

nRings = 3
nn = [[nRings+1-n] * n for n in range(nRings+1)]
RING = reduce(lambda x, y: x+y, nn)
ID = range(1,len(RING)+1)
X = np.array([ID, RING])

dt = {'names':['ID', 'Ring'], 'formats':[np.int, np.int]}
Y = np.zeros(len(RING), dtype=dt)
Y['ID'] = X[0]
Y['Ring'] = X[1]

回答by jonnybazookatone

Try re-writing the definition of X:

尝试重写 X 的定义:

X = np.array(zip(ID, RING))

and then you don't need to define Y = X.T

然后你不需要定义 Y = XT

回答by hgazibara

Are you completely sure about the outputs for Aand Y? I get something different using Python 2.7.6 and numpy 1.8.1.

您完全确定A和的输出Y吗?我使用 Python 2.7.6 和 numpy 1.8.1 得到了一些不同的东西。

My initial output for Ais the same as yours, as it should be. After running the following code for the first example

我的初始输出A与您的相同,应该如此。为第一个示例运行以下代码后

dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
A.dtype=dt

the contents of array Aare actually

数组的内容A实际上是

array([[(1, 0), (3, 0)],
   [(2, 0), (2, 0)],
   [(3, 0), (2, 0)],
   [(4, 0), (1, 0)],
   [(5, 0), (1, 0)],
   [(6, 0), (1, 0)]], 
  dtype=[('ID', '<i4'), ('Ring', '<i4')])

This makes somewhat more sense to me than the output you added because dtypedetermines the data-type of every element in the array and the new definition states that every element should contain two fields, so it does, but the value of the second field is set to 0 because there was no preexisting value for the second field.

这对我来说比你添加的输出更有意义,因为它dtype决定了数组中每个元素的数据类型,而新定义指出每个元素都应该包含两个字段,所以确实如此,但第二个字段的值已设置为 0,因为第二个字段没有预先存在的值。

However, if you would like to make numpy group columns of your existing array so that every row contains only one element, but with each element having two fields, you could introduce a small code change.

但是,如果您想让现有数组的 numpy 组列使每一行只包含一个元素,但每个元素都有两个字段,则可以引入一个小的代码更改。

Since a tuple is needed to make numpy group elements into a more complex data-type, you could make this happen by creating a new array and turning every row of the existing array into a tuple. Here is a simple working example

由于需要一个元组来将 numpy 组元素转换为更复杂的数据类型,因此您可以通过创建一个新数组并将现有数组的每一行转换为一个元组来实现这一点。这是一个简单的工作示例

import numpy as np
A = np.array(((1,2),(3,4),(50,100)))
dt = np.dtype([('ID', np.int32), ('Ring', np.int32)])
B = np.array(list(map(tuple, A)), dtype=dt)

Using this short piece of code, array Bbecomes

使用这段简短的代码,数组B变成

array([(1, 2), (3, 4), (50, 100)], 
  dtype=[('ID', '<i4'), ('Ring', '<i4')])

To make Ba 2D array, it is enough to write

要做B一个二维数组,这样写就够了

B.reshape(len(B), 1) # in this case, even B.size would work instead of len(B)

For the second example, the similar thing needs to be done to make Y a structured array:

对于第二个示例,需要做类似的事情来使 Y 成为结构化数组:

Y = np.array(list(map(tuple, X.T)), dtype=dt)

After doing this for your second example, array Y looks like this

在为第二个示例执行此操作后,数组 Y 如下所示

array([(1, 3), (2, 2), (3, 2), (4, 1), (5, 1), (6, 1)], 
  dtype=[('ID', '<i4'), ('Ring', '<i4')])

You can notice that the output is not the same as the one you expect it to be, but this one is simpler because instead of writing Y[0,0]to get the first element, you can just write Y[0]. To also make this array 2D, you can also use reshape, just as with B.

您会注意到输出与您期望的输出不同,但这个更简单,因为您无需编写Y[0,0]以获取第一个元素,而只需编写Y[0]. 要也使这个数组成为二维,您还可以使用reshape,就像使用B.

回答by HYRY

This is because Y is not C_CONTIGUOUS, you can check it by Y.flags:

这是因为 Y 不是 C_CONTIGUOUS,您可以通过Y.flags以下方式检查它:

  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

You can call Y.copy()or Y.ravel()first:

您可以致电Y.copy()Y.ravel()先:

dt = {'names':['ID', 'Ring'], 'formats':[np.int32, np.int32]}
print Y.ravel().view(dt) # the result shape is (6, )
print Y.copy().view(dt)  # the result shape is (6, 1)

回答by lX-Xl

store-different-datatypes-in-one-numpy-arrayanother page including a nice solution of adding name to an array which can be used as column Example:

store-different-datatypes-in-one-numpy-array另一个页面,包括将名称添加到可用作列的数组的一个很好的解决方案示例:

r = np.core.records.fromarrays([x1,x2,x3],names='a,b,c')
# x1, x2, x3 are flatten array
# a,b,c are field name