python dict到numpy结构化数组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15579649/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 20:27:45  来源:igfitidea点击:

python dict to numpy structured array

pythonnumpyarcpy

提问by Christa

I have a dictionary that I need to convert to a NumPy structured array. I'm using the arcpy function NumPyArraytoTable, so a NumPy structured array is the only data format that will work.

我有一本需要转换为 NumPy 结构化数组的字典。我正在使用 arcpy 函数NumPyArraytoTable,因此 NumPy 结构化数组是唯一可用的数据格式。

Based on this thread: Writing to numpy array from dictionaryand this thread: How to convert Python dictionary object to numpy array

基于此线程:Writing to numpy array from dictionary和此线程:How to convert Python dictionary object to numpy array

I've tried this:

我试过这个:

result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}

names = ['id','data']
formats = ['f8','f8']
dtype = dict(names = names, formats=formats)
array=numpy.array([[key,val] for (key,val) in result.iteritems()],dtype)

But I keep getting expected a readable buffer object

但我不断得到 expected a readable buffer object

The method below works, but is stupid and obviously won't work for real data. I know there is a more graceful approach, I just can't figure it out.

下面的方法有效,但很愚蠢,显然不适用于真实数据。我知道有一种更优雅的方法,我只是想不通。

totable = numpy.array([[key,val] for (key,val) in result.iteritems()])
array=numpy.array([(totable[0,0],totable[0,1]),(totable[1,0],totable[1,1])],dtype)

采纳答案by unutbu

You could use np.array(list(result.items()), dtype=dtype):

你可以使用np.array(list(result.items()), dtype=dtype)

import numpy as np
result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}

names = ['id','data']
formats = ['f8','f8']
dtype = dict(names = names, formats=formats)
array = np.array(list(result.items()), dtype=dtype)

print(repr(array))

yields

产量

array([(0.0, 1.1181753789488595), (1.0, 0.5566080288678394),
       (2.0, 0.4718269778030734), (3.0, 0.48716683119447185), (4.0, 1.0),
       (5.0, 0.1395076201641266), (6.0, 0.20941558441558442)], 
      dtype=[('id', '<f8'), ('data', '<f8')])


If you don't want to create the intermediate list of tuples, list(result.items()), then you could instead use np.fromiter:

如果您不想创建元组的中间列表, list(result.items()),则可以改为使用np.fromiter

In Python2:

在 Python2 中:

array = np.fromiter(result.iteritems(), dtype=dtype, count=len(result))

In Python3:

在 Python3 中:

array = np.fromiter(result.items(), dtype=dtype, count=len(result))


Why using the list [key,val]does not work:

为什么使用列表[key,val]不起作用:

By the way, your attempt,

顺便说一句,你的尝试,

numpy.array([[key,val] for (key,val) in result.iteritems()],dtype)

was very close to working. If you change the list [key, val]to the tuple (key, val), then it would have worked. Of course,

非常接近工作。如果您将列表更改为[key, val]tuple (key, val),那么它会起作用。当然,

numpy.array([(key,val) for (key,val) in result.iteritems()], dtype)

is the same thing as

是一样的

numpy.array(result.items(), dtype)

in Python2, or

在 Python2 中,或

numpy.array(list(result.items()), dtype)

in Python3.

在 Python3 中。



np.arraytreats lists differently than tuples: Robert Kern explains:

np.array以不同于元组的方式对待列表:Robert Kern 解释说

As a rule, tuples are considered "scalar" records and lists are recursed upon. This rule helps numpy.array() figure out which sequences are records and which are other sequences to be recursed upon; i.e. which sequences create another dimension and which are the atomic elements.

通常,元组被认为是“标量”记录并且列表被递归。这个规则帮助 numpy.array() 找出哪些序列是记录,哪些是要递归的其他序列;即哪些序列创建另一个维度,哪些是原子元素。

Since (0.0, 1.1181753789488595)is considered one of those atomic elements, it should be a tuple, not a list.

由于(0.0, 1.1181753789488595)被认为是这些原子元素之一,它应该是一个元组,而不是一个列表。

回答by dgdm

Let me propose an improved method when the values of the dictionnary are lists with the same lenght :

当字典的值是具有相同长度的列表时,让我提出一种改进的方法:

import numpy

def dctToNdarray (dd, szFormat = 'f8'):
    '''
    Convert a 'rectangular' dictionnary to numpy NdArray
    entry 
        dd : dictionnary (same len of list 
    retrun
        data : numpy NdArray 
    '''
    names = dd.keys()
    firstKey = dd.keys()[0]
    formats = [szFormat]*len(names)
    dtype = dict(names = names, formats=formats)
    values = [tuple(dd[k][0] for k in dd.keys())]
    data = numpy.array(values, dtype=dtype)
    for i in range(1,len(dd[firstKey])) :
        values = [tuple(dd[k][i] for k in dd.keys())]
        data_tmp = numpy.array(values, dtype=dtype)
        data = numpy.concatenate((data,data_tmp))
    return data

dd = {'a':[1,2.05,25.48],'b':[2,1.07,9],'c':[3,3.01,6.14]}
data = dctToNdarray(dd)
print data.dtype.names
print data

回答by Federico Ressi

I would prefer storing keys and values on separate arrays. This i often more practical. Structures of arrays are perfect replacement to array of structures. As most of the time you have to process only a subset of your data (in this cases keys or values, operation only with only one of the two arrays would be more efficient than operating with half of the two arrays together.

我更喜欢将键和值存储在单独的数组上。这我往往更实用。数组结构是结构数组的完美替代品。由于大多数情况下您只需要处理数据的一个子集(在这种情况下,键或值,仅使用两个数组中的一个进行操作比将两个数组中的一半放在一起操作更有效。

But in case this way is not possible, I would suggest to use arrays sorted by column instead of by row. In this way you would have the same benefit as having two arrays, but packed only in one.

但如果这种方式是不可能的,我建议使用按列而不是按行排序的数组。通过这种方式,您将获得与拥有两个数组相同的好处,但只打包在一个中。

import numpy as np
result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}

names = 0
values = 1
array = np.empty(shape=(2, len(result)), dtype=float)
array[names] = r.keys()
array[values] = r.values()

But my favorite is this (simpler):

但我最喜欢的是这个(更简单):

import numpy as np
result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}

arrays = {'names': np.array(k.keys(), dtype=float),
          'values': np.array(k.values(), dtype=float)}

回答by dgdm

Even more simple if you accept using pandas :

如果您接受使用 pandas 则更简单:

import pandas
result = {0: 1.1181753789488595, 1: 0.5566080288678394, 2: 0.4718269778030734, 3: 0.48716683119447185, 4: 1.0, 5: 0.1395076201641266, 6: 0.20941558441558442}
df = pandas.DataFrame(result, index=[0])
print df

gives :

给出:

          0         1         2         3  4         5         6
0  1.118175  0.556608  0.471827  0.487167  1  0.139508  0.209416

回答by Can Hicabi Tartanoglu

Similarly to the approved answer. If you want to create an array from dictionary keys:

与批准的答案类似。如果要从字典键创建数组:

np.array( tuple(dict.keys()) )

If you want to create an array from dictionary values:

如果要从字典值创建数组:

np.array( tuple(dict.values()) )