如何使用 Python 保存数据?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1389738/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to save data with Python?
提问by prattmic
I am working on a program in Python and want users to be able to save data they are working on. I have looked into cPickle; it seems like it would be a fast and easy way to save data, it seems insecure. Since entire functions, classes, etc can be pickled, I am worried that a rogue save file could inject harmful code into the program. Is there a way I can prevent that, or should I look into other methods of saving data, such as directly converting to a string (which also seems insecure,) or creating an XML hierarchy, and putting data in that.
我正在用 Python 开发一个程序,并希望用户能够保存他们正在处理的数据。我已经研究过 cPickle;这似乎是一种快速简便的保存数据的方法,但似乎不安全。由于可以腌制整个函数、类等,我担心流氓保存文件可能会将有害代码注入程序。有没有办法防止这种情况发生,或者我应该研究其他保存数据的方法,例如直接转换为字符串(这似乎也不安全)或创建 XML 层次结构,然后将数据放入其中。
I am new to python, so please bear with me.
我是python的新手,所以请多多包涵。
Thanks in advance!
提前致谢!
EDIT: As for the type of data I am storing, it is mainly dictionaries and lists. Information such as names, speeds, etc. It is fairly simple right now, but may get more complex in the future.
编辑:至于我存储的数据类型,主要是字典和列表。诸如姓名、速度等信息。现在相当简单,但将来可能会变得更加复杂。
回答by Nadia Alramli
From your description JSON encoding is the secure and fast solution. There is a json module in python2.6, you can use it like this:
根据您的描述,JSON 编码是安全且快速的解决方案。python2.6中有一个json模块,你可以这样使用:
import json
obj = {'key1': 'value1', 'key2': [1, 2, 3, 4], 'key3': 1322}
encoded = json.dumps(obj)
obj = json.loads(encoded)
JSON format is human readable and is very similar to the dictionary string representation in python. And doesn't have any security issues like pickle. If you don't have python2.6 you can install cjson or simplejson
JSON 格式是人类可读的,与 Python 中的字典字符串表示非常相似。并且没有像pickle这样的任何安全问题。如果你没有 python2.6 你可以安装 cjson 或simplejson
You can't use JSON to save python objects like Pickle. But you can use it to save: strings, dictionaries, lists, ... It can be enough for most cases.
您不能使用 JSON 来保存像 Pickle 这样的 Python 对象。但是你可以用它来保存:字符串、字典、列表……对于大多数情况来说已经足够了。
To explain why pickle is insecure.From python docs:
解释为什么泡菜不安全。来自 python文档:
Most of the security issues surrounding the pickle and cPickle module involve unpickling. There are no known security vulnerabilities related to pickling because you (the programmer) control the objects that pickle will interact with, and all it produces is a string.
However, for unpickling, it is nevera good idea to unpickle an untrusted string whose origins are dubious, for example, strings read from a socket. This is because unpickling can create unexpected objects and even potentially run methods of those objects, such as their class constructor or destructor ... The moral of the story is that you should be really careful about the source of the strings your application unpickles.
大多数围绕pickle 和cPickle 模块的安全问题都涉及到unpickle。没有与pickle 相关的已知安全漏洞,因为您(程序员)控制pickle 将与之交互的对象,并且它产生的只是一个字符串。
但是,对于 unpickling,对来源可疑的不受信任的字符串进行 unpickle绝不是一个好主意,例如,从套接字读取的字符串。这是因为 unpickling 可以创建意想不到的对象,甚至可能运行这些对象的方法,例如它们的类构造函数或析构函数……这个故事的寓意是你应该非常小心你的应用程序 unpickle 的字符串的来源。
There are some ways to defend yourself but it is much easier to use JSON in your case.
有一些方法可以保护自己,但在您的情况下使用 JSON 更容易。
回答by Vinko Vrsalovic
You could do something like:
你可以这样做:
to write
来写
- Pickle
- Sign pickled file
- Done
- 泡菜
- 签署腌制文件
- 完毕
to read
读书
- Check pickled file's signature
- Unpickle
- Use
- 检查腌制文件的签名
- 解开
- 利用
I wonder though what makes you think that the data files are going to be tampered but your application is not going to be?
我想知道是什么让您认为数据文件将被篡改但您的应用程序不会被篡改?
回答by Ned Batchelder
You need to give us more context before we can answer: what type of data are you saving, how much is there, how do you want to access it?
在我们回答之前,您需要提供更多背景信息:您要保存什么类型的数据,有多少,您想如何访问它?
As for pickles: they do not store code. When you pickle a function or class, it is the name that is stored, not the actual code itself.
至于泡菜:它们不存储代码。当你pickle一个函数或类时,它是存储的名称,而不是实际的代码本身。
回答by u0b34a0f6ae
*****In this answer, I'm only concerned about accidentalcorruption of the application's integrity.*****
*****在这个答案中,我只关心应用程序完整性的意外损坏。*****
Pickle is "secure". What might be insecure is accessing code you didn't write, for example in plugins; that is not relevant to pickles though.
泡菜是“安全的”。可能不安全的是访问不是您编写的代码,例如在插件中;不过这与泡菜无关。
When you pickle an object, all its data is saved, but code and implementation is not. This means when unpickled, an updated object might find it has "old-style" data inside (if you update the implementation). This is something you must know and handle, if applicable.
当你pickle一个对象时,它的所有数据都会被保存,但代码和实现不会。这意味着当 unpickled 时,更新的对象可能会发现它里面有“旧式”数据(如果你更新了实现)。如果适用,这是您必须了解和处理的事情。
Pickling strings, lists, numbers, dicts is very easy and works perfectly, and comparably to JSON. The Pickle magic is that -- sometimes without adjustment -- even complex python objects can be pickled. But only data is pickled; the instances are reconstructed simply by the saved module name and type name of the object.
酸洗字符串、列表、数字、dicts 非常容易并且完美运行,与 JSON 相当。Pickle 的神奇之处在于——有时无需调整——即使是复杂的 python 对象也可以被pickle。但只有数据被腌制;实例仅由保存的模块名称和对象的类型名称重建。
回答by Tupteq
You should use a database of some kind. Storing in pickle format isn't a good idea (in most cases). You may consider:
您应该使用某种数据库。以泡菜格式存储不是一个好主意(在大多数情况下)。你可以考虑:
- SQLite- (included in Python 2.5+) fast and simple, but requires knowledge of SQL and DB-API
- buzhug- non-SQL, file based database with pythonic syntax
- SQL database - you may use interface to some of DBMS (like MySQL, PostreSQL etc.), but it's only good for larger amount of data (thousands of records).
- SQLite-(包含在 Python 2.5+ 中)快速而简单,但需要 SQL 和 DB-API 的知识
- buzhug- 具有 Pythonic 语法的非 SQL、基于文件的数据库
- SQL 数据库 - 您可以使用某些 DBMS(如 MySQL、PostreSQL 等)的接口,但它仅适用于大量数据(数千条记录)。
You may find some other solutions here.
您可以在此处找到其他一些解决方案。
回答by S.Lott
Who -- specifically -- is the sociopath who's going through the effort to break a program by hacking the pickled file?
谁——特别是——是正在努力通过破解腌制文件来破坏程序的反社会者?
It's Python. The sociopath has your source. They don't need to fool around hacking your pickle file. They can just edit your source and do all the "damage" they want.
是 Python。反社会者有你的来源。他们不需要胡乱破解您的泡菜文件。他们可以编辑您的来源并进行他们想要的所有“损害”。
Don't worry about "insecurity" unless you're involved in litigation with organized crime syndicates.
除非您卷入了与有组织犯罪集团的诉讼,否则不要担心“不安全感”。
Don't worry about "a rogue save file could inject harmful code into the program". No one will bother with a rogue save file when they have the source.
不要担心“流氓保存文件可能会将有害代码注入程序”。当他们拥有源文件时,没有人会打扰流氓保存文件。
回答by S.Lott
You might enjoy working with the y_serial module over at http://yserial.sourceforge.net
您可能喜欢在http://yserial.sourceforge.net 上使用 y_serial 模块
which reads like a tutorial but operationally offers working code for serialization and persistance. The commentary discusses some of the pros and cons relevant to issues raised here.
它读起来像一个教程,但在操作上提供了用于序列化和持久化的工作代码。评论讨论了与这里提出的问题相关的一些利弊。
It's designed to be a general solution to warehousing compressed Python objects with SQLite (with almost no SQL fuss ;-)
它旨在成为使用 SQLite 存储压缩 Python 对象的通用解决方案(几乎没有 SQL 大惊小怪;-)
Hope this helps.
希望这可以帮助。