Python 如何在保留矩阵维度的同时序列化 numpy 数组？

Question

提问by blz

numpy.array.tostringdoesn't seem to preserve information about matrix dimensions (see this question), requiring the user to issue a call to numpy.array.reshape.

numpy.array.tostring似乎没有保留有关矩阵维度的信息（请参阅此问题），需要用户发出对numpy.array.reshape.

Is there a way to serialize a numpy array to JSON format while preserving this information?

有没有办法在保留此信息的同时将 numpy 数组序列化为 JSON 格式？

Note:The arrays may contain ints, floats or bools. It's reasonable to expect a transposed array.

注意：数组可能包含整数、浮点数或布尔值。期望转置数组是合理的。

Note 2:this is being done with the intent of passing the numpy array through a Storm topology using streamparse, in case such information ends up being relevant.

注 2：这样做的目的是使用 streamparse 通过 Storm 拓扑传递 numpy 数组，以防此类信息最终相关。

Answer 1

采纳答案by user2357112 supports Monica

pickle.dumpsor numpy.saveencode all the information needed to reconstruct an arbitrary NumPy array, even in the presence of endianness issues, non-contiguous arrays, or weird tuple dtypes. Endianness issues are probably the most important; you don't want array([1])to suddenly become array([16777216])because you loaded your array on a big-endian machine. pickleis probably the more convenient option, though savehas its own benefits, given in the npyformat rationale.

pickle.dumps或numpy.save编码重建任意 NumPy 数组所需的所有信息，即使存在字节序问题、非连续数组或奇怪的元组数据类型。字节序问题可能是最重要的。你不想array([1])突然变成array([16777216])因为你在大端机器上加载了你的阵列。pickle可能是更方便的选择，尽管save在npy格式基本原理中给出了它自己的好处。

The pickleoption:

该pickle选项：

import pickle
a = # some NumPy array
serialized = pickle.dumps(a, protocol=0) # protocol 0 is printable ASCII
deserialized_a = pickle.loads(serialized)

numpy.saveuses a binary format, and it needs to write to a file, but you can get around that with io.BytesIO:

numpy.save使用二进制格式，它需要写入文件，但您可以使用以下方法解决io.BytesIO：

a = # any NumPy array
memfile = io.BytesIO()
numpy.save(memfile, a)
memfile.seek(0)
serialized = json.dumps(memfile.read().decode('latin-1'))
# latin-1 maps byte n to unicode code point n

And to deserialize:

并反序列化：

memfile = io.BytesIO()
memfile.write(json.loads(serialized).encode('latin-1'))
memfile.seek(0)
a = numpy.load(memfile)

Answer 2

回答by daniel451

EDIT:As one can read in the comments of the question this solution deals with "normal" numpy arrays (floats, ints, bools ...) and not with multi-type structured arrays.

编辑：正如人们可以在问题的评论中阅读的那样，该解决方案处理“普通”numpy 数组（浮点数、整数、布尔值……）而不是多类型结构化数组。

Solution for serializing a numpy array of any dimensions and data types

序列化任意维度和数据类型的numpy数组的解决方案

As far as I know you can not simply serialize a numpy array with any data type and any dimension...but you can store its data type, dimension and information in a list representation and then serialize it using JSON.

据我所知，您不能简单地序列化具有任何数据类型和任何维度的 numpy 数组……但您可以将其数据类型、维度和信息存储在列表表示中，然后使用 JSON 对其进行序列化。

Imports needed:

需要进口：

import json
import base64

For encodingyou could use (nparrayis some numpy array of any data type and any dimensionality):

对于编码，您可以使用（nparray是任何数据类型和任何维度的一些 numpy 数组）：

json.dumps([str(nparray.dtype), base64.b64encode(nparray), nparray.shape])

After this you get a JSON dump (string) of your data, containing a list representation of its data type and shape as well as the arrays data/contents base64-encoded.

在此之后，您将获得数据的 JSON 转储（字符串），其中包含其数据类型和形状的列表表示以及 base64 编码的数组数据/内容。

And for decodingthis does the work (encStris the encoded JSON string, loaded from somewhere):

和解码这样做的工作（encStr是编码JSON字符串，从什么地方装）：

# get the encoded json dump
enc = json.loads(encStr)

# build the numpy data type
dataType = numpy.dtype(enc[0])

# decode the base64 encoded numpy array data and create a new numpy array with this data & type
dataArray = numpy.frombuffer(base64.decodestring(enc[1]), dataType)

# if the array had more than one data set it has to be reshaped
if len(enc) > 2:
     dataArray.reshape(enc[2])   # return the reshaped numpy array containing several data sets

JSON dumps are efficient and cross-compatible for many reasons but just taking JSON leads to unexpected results if you want to store and load numpy arrays of any typeand any dimension.

出于多种原因，JSON 转储是高效且交叉兼容的，但如果您想存储和加载任何类型和任何维度的numpy 数组，仅采用 JSON 会导致意外结果。

This solution stores and loads numpy arrays regardless of the type or dimension and also restores it correctly (data type, dimension, ...)

此解决方案存储和加载 numpy 数组，而不管类型或维度如何，并且还可以正确恢复它（数据类型、维度等）

I tried several solutions myself months ago and this was the only efficient, versatile solution I came across.

几个月前我自己尝试了几种解决方案，这是我遇到的唯一高效、通用的解决方案。

Answer 3

回答by Ken

Try using numpy.array_repror numpy.array_str.

尝试使用numpy.array_repr或numpy.array_str。

Answer 4

回答by Rebs

I found the code in Msgpack-numpy helpful. https://github.com/lebedov/msgpack-numpy/blob/master/msgpack_numpy.py

我发现 Msgpack-numpy 中的代码很有帮助。 https://github.com/lebedov/msgpack-numpy/blob/master/msgpack_numpy.py

I modified the serialised dict slightly and added base64 encoding to reduce the serialised size.

我稍微修改了序列化的 dict 并添加了 base64 编码以减少序列化的大小。

By using the same interface as json (providing load(s),dump(s)), you can provide a drop-in replacement for json serialisation.

通过使用与 json 相同的接口（提供负载、转储），您可以提供 json 序列化的替代品。

This same logic can be extended to add any automatic non-trivial serialisation, such as datetime objects.

可以扩展相同的逻辑以添加任何自动的非平凡序列化，例如日期时间对象。

EDITI've written a generic, modular, parser that does this and more. https://github.com/someones/jaweson

编辑我写了一个通用的、模块化的、解析器来完成这个以及更多。 https://github.com/someones/jaweson

My code is as follows:

我的代码如下：

np_json.py

from json import *
import json
import numpy as np
import base64

def to_json(obj):
    if isinstance(obj, (np.ndarray, np.generic)):
        if isinstance(obj, np.ndarray):
            return {
                '__ndarray__': base64.b64encode(obj.tostring()),
                'dtype': obj.dtype.str,
                'shape': obj.shape,
            }
        elif isinstance(obj, (np.bool_, np.number)):
            return {
                '__npgeneric__': base64.b64encode(obj.tostring()),
                'dtype': obj.dtype.str,
            }
    if isinstance(obj, set):
        return {'__set__': list(obj)}
    if isinstance(obj, tuple):
        return {'__tuple__': list(obj)}
    if isinstance(obj, complex):
        return {'__complex__': obj.__repr__()}

    # Let the base class default method raise the TypeError
    raise TypeError('Unable to serialise object of type {}'.format(type(obj)))


def from_json(obj):
    # check for numpy
    if isinstance(obj, dict):
        if '__ndarray__' in obj:
            return np.fromstring(
                base64.b64decode(obj['__ndarray__']),
                dtype=np.dtype(obj['dtype'])
            ).reshape(obj['shape'])
        if '__npgeneric__' in obj:
            return np.fromstring(
                base64.b64decode(obj['__npgeneric__']),
                dtype=np.dtype(obj['dtype'])
            )[0]
        if '__set__' in obj:
            return set(obj['__set__'])
        if '__tuple__' in obj:
            return tuple(obj['__tuple__'])
        if '__complex__' in obj:
            return complex(obj['__complex__'])

    return obj

# over-write the load(s)/dump(s) functions
def load(*args, **kwargs):
    kwargs['object_hook'] = from_json
    return json.load(*args, **kwargs)


def loads(*args, **kwargs):
    kwargs['object_hook'] = from_json
    return json.loads(*args, **kwargs)


def dump(*args, **kwargs):
    kwargs['default'] = to_json
    return json.dump(*args, **kwargs)


def dumps(*args, **kwargs):
    kwargs['default'] = to_json
    return json.dumps(*args, **kwargs)

You should be able to then do the following:

然后，您应该能够执行以下操作：

import numpy as np
import np_json as json
np_data = np.zeros((10,10), dtype=np.float32)
new_data = json.loads(json.dumps(np_data))
assert (np_data == new_data).all()

Answer 5

回答by Chris.Wilson

If it needs to be human readable and you know that this is a numpy array:

如果它需要人类可读并且您知道这是一个 numpy 数组：

import numpy as np; 
import json;

a = np.random.normal(size=(50,120,150))
a_reconstructed = np.asarray(json.loads(json.dumps(a.tolist())))
print np.allclose(a,a_reconstructed)
print (a==a_reconstructed).all()

Maybe not the most efficient as the array sizes grow larger, but works for smaller arrays.

随着数组大小的增长，可能不是最有效的，但适用于较小的数组。

Answer 6

回答by thayne

Msgpack has the best serialization performance: http://www.benfrederickson.com/dont-pickle-your-data/

Msgpack 序列化性能最好：http: //www.benfrederickson.com/dont-pickle-your-data/

Use msgpack-numpy. See https://github.com/lebedov/msgpack-numpy

使用 msgpack-numpy。见https://github.com/lebedov/msgpack-numpy

Install it:

安装它：

pip install msgpack-numpy

Then:

然后：

import msgpack
import msgpack_numpy as m
import numpy as np

x = np.random.rand(5)
x_enc = msgpack.packb(x, default=m.encode)
x_rec = msgpack.unpackb(x_enc, object_hook=m.decode)

Answer 7

回答by SemanticBeeng

Try traitschemahttps://traitschema.readthedocs.io/en/latest/

试试traitschemahttps://traitschema.readthedocs.io/en/latest/

"Create serializable, type-checked schema using traits and Numpy. A typical use case involves saving several Numpy arrays of varying shape and type."

“使用特征和 Numpy 创建可序列化、类型检查的模式。典型用例涉及保存多个不同形状和类型的 Numpy 数组。”

Python 如何在保留矩阵维度的同时序列化 numpy 数组？

提问by blz

采纳答案by user2357112 supports Monica

回答by daniel451

回答by Ken

回答by Rebs

回答by Chris.Wilson

回答by thayne

回答by SemanticBeeng

相关推荐

最近更新

标签

Python 如何在保留矩阵维度的同时序列化 numpy 数组？

提问by blz

采纳答案by user2357112 supports Monica

回答by daniel451

回答by Ken

回答by Rebs

回答by Chris.Wilson

回答by thayne

回答by SemanticBeeng

相关推荐

如何在python中将对象数组转换为普通数组

Python 发布 osx 通知

Python 导入错误：没有名为 sklearn.cross_validation 的模块

Python：删除除法小数点

相关推荐

最近更新

标签