Python 保存对象(数据持久性)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4529815/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 16:10:01  来源:igfitidea点击:

Saving an Object (Data persistence)

pythonobjectserializationsavepickle

提问by Peterstone

I've created an object like this:

我创建了一个这样的对象:

company1.name = 'banana' 
company1.value = 40

I would like to save this object. How can I do that?

我想保存这个对象。我怎样才能做到这一点?

采纳答案by martineau

You could use the picklemodule in the standard library. Here's an elementary application of it to your example:

您可以使用pickle标准库中的模块。这是它在您的示例中的基本应用:

import pickle

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

with open('company_data.pkl', 'wb') as output:
    company1 = Company('banana', 40)
    pickle.dump(company1, output, pickle.HIGHEST_PROTOCOL)

    company2 = Company('spam', 42)
    pickle.dump(company2, output, pickle.HIGHEST_PROTOCOL)

del company1
del company2

with open('company_data.pkl', 'rb') as input:
    company1 = pickle.load(input)
    print(company1.name)  # -> banana
    print(company1.value)  # -> 40

    company2 = pickle.load(input)
    print(company2.name) # -> spam
    print(company2.value)  # -> 42

You could also define your own simple utility like the following which opens a file and writes a single object to it:

您还可以定义自己的简单实用程序,如下所示,它打开一个文件并向其中写入一个对象:

def save_object(obj, filename):
    with open(filename, 'wb') as output:  # Overwrites any existing file.
        pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

# sample usage
save_object(company1, 'company1.pkl')

Update

更新

Since this is such a popular answer, I'd like touch on a few slightly advanced usage topics.

由于这是一个如此受欢迎的答案,我想谈谈一些稍微高级的使用主题。

cPickle(or _pickle) vs pickle

cPickle(或_pickle)与pickle

It's almost always preferable to actually use the cPicklemodule rather than picklebecause the former is written in C and is much faster. There are some subtle differences between them, but in most situations they're equivalent and the C version will provide greatly superior performance. Switching to it couldn't be easier, just change the importstatement to this:

实际使用cPickle模块几乎总是可取的,而不是pickle因为前者是用 C 编写的,而且速度要快得多。它们之间有一些细微的差别,但在大多数情况下它们是等效的,C 版本将提供非常优越的性能。切换到它再简单不过了,只需将import语句更改为:

import cPickle as pickle

In Python 3, cPicklewas renamed _pickle, but doing this is no longer necessary since the picklemodule now does it automatically—see What difference between pickle and _pickle in python 3?.

在 Python 3 中,cPickle已重命名_pickle,但不再需要这样做,因为pickle模块现在会自动执行此操作—请参阅Python 3 中的 pickle 和 _pickle 之间有什么区别?.

The rundown is you could use something like the following to ensure that your code will alwaysuse the C version when it's available in both Python 2 and 3:

概要是您可以使用类似以下内容来确保您的代码在 Python 2 和 3 中都可用时始终使用 C 版本:

try:
    import cPickle as pickle
except ModuleNotFoundError:
    import pickle

Data stream formats (protocols)

数据流格式(协议)

picklecan read and write files in several different, Python-specific, formats, called protocolsas described in the documentation, "Protocol version 0" is ASCII and therefore "human-readable". Versions > 0 are binary and the highest one available depends on what version of Python is being used. The default also depends on Python version. In Python 2 the default was Protocol version 0, but in Python 3.8.1, it's Protocol version 4. In Python 3.x the module had a pickle.DEFAULT_PROTOCOLadded to it, but that doesn't exist in Python 2.

pickle可以读取和写入几种不同的、特定于 Python 的格式的文件,如文档中所述,称为协议,“协议版本 0”是 ASCII,因此是“人类可读的”。版本 > 0 是二进制的,可用的最高版本取决于所使用的 Python 版本。默认值还取决于 Python 版本。在 Python 2 中,默认是 Protocol version ,但在 Python 3.8.1 中,它是 Protocol version 。在 Python 3.x 中,该模块添加了一个,但在 Python 2 中不存在。04pickle.DEFAULT_PROTOCOL

Fortunately there's shorthand for writing pickle.HIGHEST_PROTOCOLin every call (assuming that's what you want, and you usually do), just use the literal number -1— similar to referencing the last element of a sequence via a negative index. So, instead of writing:

幸运的是,pickle.HIGHEST_PROTOCOL在每次调用中都有写的简写(假设这是您想要的,并且您通常会这样做),只需使用文字数字-1— 类似于通过负索引引用序列的最后一个元素。所以,而不是写:

pickle.dump(obj, output, pickle.HIGHEST_PROTOCOL)

You can just write:

你可以只写:

pickle.dump(obj, output, -1)

Either way, you'd only have specify the protocol once if you created a Picklerobject for use in multiple pickle operations:

无论哪种方式,如果您创建了一个Pickler用于多个 pickle 操作的对象,您只需指定一次协议:

pickler = pickle.Pickler(output, -1)
pickler.dump(obj1)
pickler.dump(obj2)
   etc...

Note: If you're in an environment running different versions of Python, then you'll probably want to explicitly use (i.e. hardcode) a specific protocol number that all of them can read (later versions can generally read files produced by earlier ones).

注意:如果您在运行不同版本 Python 的环境中,那么您可能希望显式使用(即硬编码)所有这些都可以读取的特定协议号(更高版本通常可以读取由早期版本生成的文件) .

Multiple Objects

多个对象

While a pickle file cancontain any number of pickled objects, as shown in the above samples, when there's an unknown number of them, it's often easier to store them all in some sort of variably-sized container, like a list, tuple, or dictand write them all to the file in a single call:

虽然pickle 文件可以包含任意数量的pickle 对象,如上面的示例所示,但当它们的数量未知时,通常更容易将它们全部存储在某种大小可变的容器中,例如 a listtuple、 ordict和 write一次调用即可将它们全部添加到文件中:

tech_companies = [
    Company('Apple', 114.18), Company('Google', 908.60), Company('Microsoft', 69.18)
]
save_object(tech_companies, 'tech_companies.pkl')

and restore the list and everything in it later with:

并稍后使用以下命令恢复列表及其中的所有内容:

with open('tech_companies.pkl', 'rb') as input:
    tech_companies = pickle.load(input)

The major advantage is you don't need to know how many object instances are saved in order to load them back later (although doing so without that information ispossible, it requires some slightly specialized code). See the answers to the related question Saving and loading multiple objects in pickle file?for details on different ways to do this. Personally Ilike @Lutz Prechelt's answerthe best. Here's it adapted to the examples here:

主要优点是您不需要知道保存了多少对象实例以便稍后加载它们(尽管在没有这些信息的情况下这样做可能的,它需要一些稍微专门化的代码)。请参阅相关问题的答案在泡菜文件中保存和加载多个对象?有关执行此操作的不同方法的详细信息。个人喜欢@Lutz Prechelt 的回答。这是它适用于此处的示例:

class Company:
    def __init__(self, name, value):
        self.name = name
        self.value = value

def pickled_items(filename):
    """ Unpickle a file of pickled data. """
    with open(filename, "rb") as f:
        while True:
            try:
                yield pickle.load(f)
            except EOFError:
                break

print('Companies in pickle file:')
for company in pickled_items('company_data.pkl'):
    print('  name: {}, value: {}'.format(company.name, company.value))

回答by Mike McKerns

I think it's a pretty strong assumption to assume that the object is a class. What if it's not a class? There's also the assumption that the object was not defined in the interpreter. What if it was defined in the interpreter? Also, what if the attributes were added dynamically? When some python objects have attributes added to their __dict__after creation, pickledoesn't respect the addition of those attributes (i.e. it 'forgets' they were added -- because pickleserializes by reference to the object definition).

我认为假设对象是class. 如果不是class呢?还有一个假设是对象没有在解释器中定义。如果它是在解释器中定义的呢?另外,如果属性是动态添加的呢?当某些 python 对象__dict__在创建后添加了属性时,pickle不尊重这些属性的添加(即它“忘记”添加了它们——因为pickle通过引用对象定义进行序列化)。

In all these cases, pickleand cPicklecan fail you horribly.

在所有这些情况,pickle并且cPickle可以可怕的失败你。

If you are looking to save an object(arbitrarily created), where you have attributes (either added in the object definition, or afterward)… your best bet is to use dill, which can serialize almost anything in python.

如果你想保存一个object(任意创建的),你有属性(添加到对象定义中,或者之后添加)......你最好的选择是使用dill,它可以序列化python中的几乎任何东西。

We start with a class…

我们从一堂课开始……

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> with open('company.pkl', 'wb') as f:
...     pickle.dump(company1, f, pickle.HIGHEST_PROTOCOL)
... 
>>> 

Now shut down, and restart...

现在关闭,然后重新启动...

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pickle
>>> with open('company.pkl', 'rb') as f:
...     company1 = pickle.load(f)
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1090, in load_global
    klass = self.find_class(module, name)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1126, in find_class
    klass = getattr(mod, name)
AttributeError: 'module' object has no attribute 'Company'
>>> 

Oops… picklecan't handle it. Let's try dill. We'll throw in another object type (a lambda) for good measure.

哎呀……pickle受不了了。让我们试试dill。我们将放入另一种对象类型 (a lambda) 以进行良好的衡量。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill       
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> with open('company_dill.pkl', 'wb') as f:
...     dill.dump(company1, f)
...     dill.dump(company2, f)
... 
>>> 

And now read the file.

现在读取文件。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('company_dill.pkl', 'rb') as f:
...     company1 = dill.load(f)
...     company2 = dill.load(f)
... 
>>> company1 
<__main__.Company instance at 0x107909128>
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>>    

It works. The reason picklefails, and dilldoesn't, is that dilltreats __main__like a module (for the most part), and also can pickle class definitions instead of pickling by reference (like pickledoes). The reason dillcan pickle a lambdais that it gives it a name… then pickling magic can happen.

有用。pickle失败的原因dill是它像一个模块一样dill对待__main__(在大多数情况下),并且还可以腌制类定义而不是通过引用腌制(就像pickle做的那样)。dill可以腌制 a的原因lambda是它给了它一个名字……然后腌制魔法就会发生。

Actually, there's an easier way to save all these objects, especially if you have a lot of objects you've created. Just dump the whole python session, and come back to it later.

实际上,有一种更简单的方法可以保存所有这些对象,尤其是当您创建了很多对象时。只需转储整个 python 会话,稍后再返回。

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> class Company:
...     pass
... 
>>> company1 = Company()
>>> company1.name = 'banana'
>>> company1.value = 40
>>> 
>>> company2 = lambda x:x
>>> company2.name = 'rhubarb'
>>> company2.value = 42
>>> 
>>> dill.dump_session('dill.pkl')
>>> 

Now shut down your computer, go enjoy an espresso or whatever, and come back later...

现在关掉你的电脑,去享受一杯浓缩咖啡或其他什么,然后再回来......

Python 2.7.8 (default, Jul 13 2014, 02:29:54) 
[GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> dill.load_session('dill.pkl')
>>> company1.name
'banana'
>>> company1.value
40
>>> company2.name
'rhubarb'
>>> company2.value
42
>>> company2
<function <lambda> at 0x1065f2938>

The only major drawback is that dillis not part of the python standard library. So if you can't install a python package on your server, then you can't use it.

唯一的主要缺点是它dill不是 python 标准库的一部分。所以如果你不能在你的服务器上安装一个python包,那么你就不能使用它。

However, if you are able to install python packages on your system, you can get the latest dillwith git+https://github.com/uqfoundation/dill.git@master#egg=dill. And you can get the latest released version with pip install dill.

但是,如果您能够在系统上安装 python 软件包,则可以dill使用git+https://github.com/uqfoundation/dill.git@master#egg=dill. 您可以使用pip install dill.

回答by c0fec0de

You can use anycacheto do the job for you. It considers all the details:

您可以使用anycache为您完成这项工作。它考虑了所有细节:

  • It uses dillas backend, which extends the python picklemodule to handle lambdaand all the nice python features.
  • It stores different objects to different files and reloads them properly.
  • Limits cache size
  • Allows cache clearing
  • Allows sharing of objects between multiple runs
  • Allows respect of input files which influence the result
  • 它使用dill作为后端,它扩展了 pythonpickle模块来处理lambda和所有好的 python 特性。
  • 它将不同的对象存储到不同的文件并正确地重新加载它们。
  • 限制缓存大小
  • 允许缓存清除
  • 允许在多次运行之间共享对象
  • 允许尊重影响结果的输入文件

Assuming you have a function myfuncwhich creates the instance:

假设您有一个myfunc创建实例的函数:

from anycache import anycache

class Company(object):
    def __init__(self, name, value):
        self.name = name
        self.value = value

@anycache(cachedir='/path/to/your/cache')    
def myfunc(name, value)
    return Company(name, value)

Anycache calls myfuncat the first time and pickles the result to a file in cachedirusing an unique identifier (depending on the the function name and its arguments) as filename. On any consecutive run, the pickled object is loaded. If the cachediris preserved between python runs, the pickled object is taken from the previous python run.

Anycachemyfunc在第一次调用并将结果腌制到文件中,cachedir使用唯一标识符(取决于函数名及其参数)作为文件名。在任何连续运行中,都会加载腌制对象。如果cachedir在 python 运行之间保留了 ,pickled 对象将从之前的 python 运行中获取。

For any further details see the documentation

有关任何进一步的详细信息,请参阅文档

回答by Anthony Ebert

Quick example using company1from your question, with python3.

使用company1您的问题的快速示例,使用 python3。

import pickle

# Save the file
pickle.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = pickle.load(open("company1.pickle", "rb"))

However, as this answernoted, pickle often fails. So you should really use dill.

然而,正如这个答案所指出的,泡菜经常失败。所以你真的应该使用dill.

import dill

# Save the file
dill.dump(company1, file = open("company1.pickle", "wb"))

# Reload the file
company1_reloaded = dill.load(open("company1.pickle", "rb"))