Python 保存 numpy 数组的字典

Question

提问by Gabe Spradlin

So I have a DB with a couple of years worth of site data. I am now attempting to use that data for analytics - plotting and sorting of advertising costs by keyword, etc.

所以我有一个数据库，里面有几年的站点数据。我现在正尝试使用该数据进行分析 - 按关键字等绘制和排序广告成本。

One of the data grabs from the DB takes minutes to complete. While I could spend some time optimizing the SQL statements I use to get the data I'd prefer to simply leave that class and it's SQL alone, grab the data, and save the results to a data file for faster retrieval later. Most of this DB data isn't going to change so I could write a separate python script to update the file every 24 hours and then use that file for this long running task.

从数据库中获取的数据之一需要几分钟才能完成。虽然我可以花一些时间优化我用来获取数据的 SQL 语句，但我更愿意简单地离开那个类，它是 SQL，抓取数据，并将结果保存到数据文件中，以便以后更快地检索。大多数数据库数据不会改变，所以我可以编写一个单独的 python 脚本来每 24 小时更新一次文件，然后将该文件用于这个长时间运行的任务。

The data is being returned as a dictionary of numpy arrays. When I use numpy.save('data', data)the file is saved just fine. When I use data2 = numpy.load('data.npy')it loads the file without error. However, the output data2doesn't not equal the original data.

数据作为 numpy 数组的字典返回。当我使用numpy.save('data', data)该文件时保存得很好。当我使用data2 = numpy.load('data.npy')它加载文件时没有错误。但是，输出data2不等于原始data.

Specifically the line data == data2returns false. Additionally, if I use the following:

具体而言，该行data == data2返回 false。此外，如果我使用以下内容：

for key, key_data in data.items():
  print key

it works. But when I replace data.items()with data2.items()then I get an error:

有用。但是当我替换data.items()为data2.items()then 时出现错误：

AttributeError: 'numpy.ndarray' object has no attribute 'items'

Using type(data)I get dict. Using type(data2)I get numpy.ndarray.

使用type(data)我得到dict. 使用type(data2)我得到numpy.ndarray.

So how do I fix this? I want the loaded data to equal the data I passed in for saving. Is there an argument to numpy.save to fix this or do I need some form of simple reformatting function to reformat the loaded data into the proper structure?

那么我该如何解决这个问题？我希望加载的数据等于我为保存而传入的数据。numpy.save 是否有一个参数来解决这个问题，或者我是否需要某种形式的简单重新格式化函数来将加载的数据重新格式化为正确的结构？

Attempts to get into the ndarrayvia for loops or indexing all lead to errors about indexing a 0-d array. Casting like this dict(data2)also fails for iterating over a 0-d array. However, Spyder shows value of the array and it includes the data I saved. I just can't figure out how to get to it.

尝试进入ndarrayvia for 循环或索引都会导致有关索引 0-d 数组的错误。像这样的转换dict(data2)也无法迭代 0-d 数组。但是，Spyder 显示了数组的值，其中包括我保存的数据。我只是不知道如何到达它。

If I need to reformat the loaded data I'd appreciate some example code on how to do this.

如果我需要重新格式化加载的数据，我会很感激一些关于如何执行此操作的示例代码。

Answer 1

采纳答案by hpaulj

Let's look at a small example:

让我们看一个小例子：

In [819]: N
Out[819]: 
array([[  0.,   1.,   2.,   3.],
       [  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.]])

In [820]: data={'N':N}

In [821]: np.save('temp.npy',data)

In [822]: data2=np.load('temp.npy')

In [823]: data2
Out[823]: 
array({'N': array([[  0.,   1.,   2.,   3.],
       [  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.]])}, dtype=object)

np.saveis designed to save numpy arrays. datais a dictionary. So it wrapped it in a object array, and used pickleto save that object. Your data2probably has the same character.

np.save旨在保存 numpy 数组。 data是字典。所以它将它包装在一个对象数组中，并用于pickle保存该对象。你data2可能有相同的性格。

You get at the array with:

你得到数组：

In [826]: data2[()]['N']
Out[826]: 
array([[  0.,   1.,   2.,   3.],
       [  4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.]])

Answer 2

回答by Ben Usman

I really liked the deepdish(it saves them in HDF5format):

我真的很喜欢deepdish（它以HDF5格式保存它们）：

>>> import deepdish as dd
>>> d = {'foo': np.arange(10), 'bar': np.ones((5, 4, 3))}
>>> dd.io.save('test.h5', d)

$ ddls test.h5
/bar                       array (5, 4, 3) [float64]
/foo                       array (10,) [int64]

>>> d = dd.io.load('test.h5')

for my experience, it seems to be partially broken for large datasets, though :(

根据我的经验，对于大型数据集，它似乎部分被破坏了：(

Answer 3

回答by SeF

When saving a dictionary with numpy, the dictionary is encoded into an array. To have what you need, you can do as in this example:

当用 numpy 保存字典时，字典被编码成一个数组。要获得您需要的东西，您可以按照以下示例进行操作：

my_dict = {'a' : np.array(range(3)), 'b': np.array(range(4))}

np.save('my_dict.npy',  my_dict)    

my_dict_back = np.load('my_dict.npy')

print(my_dict_back.item().keys())    
print(my_dict_back.item().get('a'))

So you are probably missing .item()for the reloaded dictionary. Check this out:

因此，您可能缺少.item()重新加载的字典。看一下这个：

for key, key_d in data2.item().items():
    print key, key_d

The comparison my_dict == my_dict_back.item()works only for dictionaries that does not have lists or arrays in their values.

比较my_dict == my_dict_back.item()仅适用于值中没有列表或数组的字典。

EDIT: for the item()issue mentioned above, I think it is a better option to save dictionaries with the library picklerather than with numpy.

编辑：对于item()上面提到的问题，我认为使用库pickle而不是numpy.

Python 保存 numpy 数组的字典

提问by Gabe Spradlin

采纳答案by hpaulj

回答by Ben Usman

回答by SeF

相关推荐

最近更新

标签

Python 保存 numpy 数组的字典

提问by Gabe Spradlin

采纳答案by hpaulj

回答by Ben Usman

回答by SeF

相关推荐

python中多个集合的并集

Python 总结每天熊猫的出现次数

Python 对象没有属性“__getitem__”（类实例？）

在python 2.6中获取线程ID或名称

相关推荐

最近更新

标签

Python 对象没有属性“getitem”（类实例？）