Python 如何解压pkl文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24906126/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to unpack pkl file?
提问by ytrewq
I have a pkl file from MNIST dataset, which consists of handwritten digit images.
我有一个来自 MNIST 数据集的 pkl 文件,它由手写数字图像组成。
I'd like to take a look at each of those digit images, so I need to unpack the pkl file, except I can't find out how.
我想看看这些数字图像中的每一个,所以我需要解压 pkl 文件,但我不知道如何解压。
Is there a way to unpack/unzip pkl file?
有没有办法解压/解压 pkl 文件?
采纳答案by Peque
Generally
一般来说
Your pkl
file is, in fact, a serialized pickle
file, which means it has been dumped using Python's pickle
module.
您的pkl
文件实际上是一个序列化pickle
文件,这意味着它已使用 Python 的pickle
模块转储。
To un-pickle the data you can:
要取消腌制数据,您可以:
import pickle
with open('serialized.pkl', 'rb') as f:
data = pickle.load(f)
For the MNIST data set
对于 MNIST 数据集
Note gzip
is only needed if the file is compressed:
gzip
仅当文件被压缩时才需要注意:
import gzip
import pickle
with gzip.open('mnist.pkl.gz', 'rb') as f:
train_set, valid_set, test_set = pickle.load(f)
Where each set can be further divided (i.e. for the training set):
每组可以进一步划分(即训练集):
train_x, train_y = train_set
Those would be the inputs (digits) and outputs (labels) of your sets.
这些将是您的集合的输入(数字)和输出(标签)。
If you want to display the digits:
如果要显示数字:
import matplotlib.cm as cm
import matplotlib.pyplot as plt
plt.imshow(train_x[0].reshape((28, 28)), cmap=cm.Greys_r)
plt.show()
The other alternative would be to look at the original data:
另一种选择是查看原始数据:
http://yann.lecun.com/exdb/mnist/
http://yann.lecun.com/exdb/mnist/
But that will be harder, as you'll need to create a program to read the binary data in those files. So I recommend you to use Python, and load the data with pickle
. As you've seen, it's very easy. ;-)
但这会更难,因为您需要创建一个程序来读取这些文件中的二进制数据。所以我建议你使用 Python,并使用pickle
. 如您所见,这非常容易。;-)
回答by osolmaz
In case you want to work with the original MNIST files, here is how you can deserialize them.
如果您想使用原始 MNIST 文件,这里是您可以反序列化它们的方法。
If you haven't downloaded the files yet, do that first by running the following in the terminal:
如果您尚未下载文件,请先在终端中运行以下命令:
wget http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
wget http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Then save the following as deserialize.py
and run it.
然后将以下内容另存为deserialize.py
并运行它。
import numpy as np
import gzip
IMG_DIM = 28
def decode_image_file(fname):
result = []
n_bytes_per_img = IMG_DIM*IMG_DIM
with gzip.open(fname, 'rb') as f:
bytes_ = f.read()
data = bytes_[16:]
if len(data) % n_bytes_per_img != 0:
raise Exception('Something wrong with the file')
result = np.frombuffer(data, dtype=np.uint8).reshape(
len(bytes_)//n_bytes_per_img, n_bytes_per_img)
return result
def decode_label_file(fname):
result = []
with gzip.open(fname, 'rb') as f:
bytes_ = f.read()
data = bytes_[8:]
result = np.frombuffer(data, dtype=np.uint8)
return result
train_images = decode_image_file('train-images-idx3-ubyte.gz')
train_labels = decode_label_file('train-labels-idx1-ubyte.gz')
test_images = decode_image_file('t10k-images-idx3-ubyte.gz')
test_labels = decode_label_file('t10k-labels-idx1-ubyte.gz')
The script doesn't normalize the pixel values like in the pickled file. To do that, all you have to do is
该脚本不会像腌制文件中那样标准化像素值。要做到这一点,你所要做的就是
train_images = train_images/255
test_images = test_images/255