Python 如何解压pkl文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24906126/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 05:26:42  来源:igfitidea点击:

How to unpack pkl file?

pythonpickledeep-learningmnist

提问by ytrewq

I have a pkl file from MNIST dataset, which consists of handwritten digit images.

我有一个来自 MNIST 数据集的 pkl 文件,它由手写数字图像组成。

I'd like to take a look at each of those digit images, so I need to unpack the pkl file, except I can't find out how.

我想看看这些数字图像中的每一个,所以我需要解压 pkl 文件,但我不知道如何解压。

Is there a way to unpack/unzip pkl file?

有没有办法解压/解压 pkl 文件?

采纳答案by Peque

Generally

一般来说

Your pklfile is, in fact, a serialized picklefile, which means it has been dumped using Python's picklemodule.

您的pkl文件实际上是一个序列化pickle文件,这意味着它已使用 Python 的pickle模块转储。

To un-pickle the data you can:

要取消腌制数据,您可以:

import pickle


with open('serialized.pkl', 'rb') as f:
    data = pickle.load(f)

For the MNIST data set

对于 MNIST 数据集

Note gzipis only needed if the file is compressed:

gzip仅当文件被压缩时才需要注意:

import gzip
import pickle


with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f)

Where each set can be further divided (i.e. for the training set):

每组可以进一步划分(即训练集):

train_x, train_y = train_set

Those would be the inputs (digits) and outputs (labels) of your sets.

这些将是您的集合的输入(数字)和输出(标签)。

If you want to display the digits:

如果要显示数字:

import matplotlib.cm as cm
import matplotlib.pyplot as plt


plt.imshow(train_x[0].reshape((28, 28)), cmap=cm.Greys_r)
plt.show()

mnist_digit

mnist_digit

The other alternative would be to look at the original data:

另一种选择是查看原始数据:

http://yann.lecun.com/exdb/mnist/

http://yann.lecun.com/exdb/mnist/

But that will be harder, as you'll need to create a program to read the binary data in those files. So I recommend you to use Python, and load the data with pickle. As you've seen, it's very easy. ;-)

但这会更难,因为您需要创建一个程序来读取这些文件中的二进制数据。所以我建议你使用 Python,并使用pickle. 如您所见,这非常容易。;-)

回答by osolmaz

In case you want to work with the original MNIST files, here is how you can deserialize them.

如果您想使用原始 MNIST 文件,这里是您可以反序列化它们的方法。

If you haven't downloaded the files yet, do that first by running the following in the terminal:

如果您尚未下载文件,请先在终端中运行以下命令:

wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz

Then save the following as deserialize.pyand run it.

然后将以下内容另存为deserialize.py并运行它。

import numpy as np
import gzip

IMG_DIM = 28

def decode_image_file(fname):
    result = []
    n_bytes_per_img = IMG_DIM*IMG_DIM

    with gzip.open(fname, 'rb') as f:
        bytes_ = f.read()
        data = bytes_[16:]

        if len(data) % n_bytes_per_img != 0:
            raise Exception('Something wrong with the file')

        result = np.frombuffer(data, dtype=np.uint8).reshape(
            len(bytes_)//n_bytes_per_img, n_bytes_per_img)

    return result

def decode_label_file(fname):
    result = []

    with gzip.open(fname, 'rb') as f:
        bytes_ = f.read()
        data = bytes_[8:]

        result = np.frombuffer(data, dtype=np.uint8)

    return result

train_images = decode_image_file('train-images-idx3-ubyte.gz')
train_labels = decode_label_file('train-labels-idx1-ubyte.gz')

test_images = decode_image_file('t10k-images-idx3-ubyte.gz')
test_labels = decode_label_file('t10k-labels-idx1-ubyte.gz')

The script doesn't normalize the pixel values like in the pickled file. To do that, all you have to do is

该脚本不会像腌制文件中那样标准化像素值。要做到这一点,你所要做的就是

train_images = train_images/255
test_images = test_images/255

回答by crabman84

The pickle(and gzipif the file is compressed) module need to be used

需要使用pickle(如果文件被压缩,则为gzip)模块

NOTE: These are already in the standard Python library. No need to install anything new

注意:这些已经在标准 Python 库中。无需安装任何新东西