在python中轻松保存/加载数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4450144/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
easy save/load of data in python
提问by nos
What is the easiest way to save and load data in python, preferably in a human-readable output format?
在 python 中保存和加载数据的最简单方法是什么,最好是人类可读的输出格式?
The data I am saving/loading consists of two vectors of floats. Ideally, these vectors would be named in the file (e.g. X and Y).
我正在保存/加载的数据由两个浮点数向量组成。理想情况下,这些向量将在文件中命名(例如 X 和 Y)。
My current save()and load()functions use file.readline(), file.write()and string-to-float conversion. There must be something better.
我的当前save()和load()函数使用file.readline(),file.write()和字符串到浮点数的转换。一定有更好的东西。
采纳答案by Sven Marnach
There are several options -- I don't exactly know what you like. If the two vectors have the same length, you could use numpy.savetxt()to save your vectors, say xand y, as columns:
有几种选择——我不完全知道你喜欢什么。如果两个向量具有相同的长度,则可以使用numpy.savetxt()将向量(例如x和 )保存y为列:
# saving:
f = open("data", "w")
f.write("# x y\n") # column names
numpy.savetxt(f, numpy.array([x, y]).T)
# loading:
x, y = numpy.loadtxt("data", unpack=True)
If you are dealing with larger vectors of floats, you should probably use NumPy anyway.
如果您正在处理较大的浮点数向量,则无论如何您都应该使用 NumPy。
回答by Mark Byers
回答by Jamie Rumbelow
The most simple way to get a human-readable output is by using a serialisation format such a JSON. Python contains a jsonlibrary you can use to serialise data to and from a string. Like pickle, you can use this with an IO object to write it to a file.
获得人类可读输出的最简单方法是使用诸如 JSON 之类的序列化格式。Python 包含一个json库,可用于将数据序列化为字符串或从字符串序列化数据。与pickle一样,您可以将其与 IO 对象一起使用以将其写入文件。
import json
file = open('/usr/data/application/json-dump.json', 'w+')
data = { "x": 12153535.232321, "y": 35234531.232322 }
json.dump(data, file)
If you want to get a simple string back instead of dumping it to a file, you can use json.dumps()instead:
如果您想返回一个简单的字符串而不是将其转储到文件中,您可以使用json。转储()代替:
import json
print json.dumps({ "x": 12153535.232321, "y": 35234531.232322 })
Reading back from a file is just as easy:
从文件中读回同样简单:
import json
file = open('/usr/data/application/json-dump.json', 'r')
print json.load(file)
The json library is full-featured, so I'd recommend checking out the documentationto see what sorts of things you can do with it.
json 库功能齐全,因此我建议您查看文档以了解您可以使用它做什么。
回答by Lennart Regebro
If it should be human-readable, I'd also go with JSON. Unless you need to exchange it with enterprise-type people, they like XML better. :-)
If it should be human editableand isn't too complex, I'd probably go with some sort of INI-like format, like for example configparser.
If it is complex, and doesn't need to be exchanged, I'd go with just pickling the data, unless it's very complex, in which case I'd use ZODB.
If it's a LOT of data, and needs to be exchanged, I'd use SQL.
如果它应该是人类可读的,我也会使用 JSON。除非您需要与企业类型的人交流,否则他们更喜欢 XML。:-)
如果它应该是人类可编辑的并且不太复杂,我可能会使用某种类似 INI 的格式,例如 configparser。
如果它很复杂,并且不需要交换,我只会酸洗数据,除非它非常复杂,在这种情况下我会使用 ZODB。
如果它有很多数据,并且需要交换,我会使用 SQL。
That pretty much covers it, I think.
我认为这几乎涵盖了它。
回答by NPE
Since we're talking about a human editing the file, I assume we're talking about relatively little data.
由于我们谈论的是人工编辑文件,因此我认为我们谈论的是相对较少的数据。
How about the following skeleton implementation. It simply saves the data as key=valuepairs and works with lists, tuples and many other things.
下面的框架实现怎么样。它只是将数据key=value成对保存并处理列表、元组和许多其他内容。
def save(fname, **kwargs):
f = open(fname, "wt")
for k, v in kwargs.items():
print >>f, "%s=%s" % (k, repr(v))
f.close()
def load(fname):
ret = {}
for line in open(fname, "rt"):
k, v = line.strip().split("=", 1)
ret[k] = eval(v)
return ret
x = [1, 2, 3]
y = [2.0, 1e15, -10.3]
save("data.txt", x=x, y=y)
d = load("data.txt")
print d["x"]
print d["y"]
回答by Dalker
As I commented in the accepted answer, using numpythis can be done with a simple one-liner:
正如我在接受的答案中评论的那样,numpy可以通过简单的单行来使用它:
Assuming you have numpyimported as np(which is common practice),
假设您已numpy导入为np(这是常见做法),
np.savetxt('xy.txt', np.array([x, y]).T, fmt="%.3f", header="x y")
will save the data in the (optional) format and
将以(可选)格式保存数据和
x, y = np.loadtxt('xy.txt', unpack=True)
will load it.
将加载它。
The file xy.txtwill then look like:
该文件xy.txt将如下所示:
# x y
1.000 1.000
1.500 2.250
2.000 4.000
2.500 6.250
3.000 9.000
Note that the format string fmt=...is optional, but if the goal is human-readability it may prove quite useful. If used, it is specified using the usual printf-like codes (In my example: floating-point number with 3 decimals).
请注意,格式字符串fmt=...是可选的,但如果目标是人类可读性,它可能会非常有用。如果使用,则使用通常的printf类似代码指定(在我的示例中:带有 3 个小数的浮点数)。
回答by Koke Cacao
Here is an example of Encoder until you probably want to write for Bodyclass:
这是编码器的示例,直到您可能想要为Body类编写代码:
# add this to your code
class BodyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.ndarray):
return obj.tolist()
if hasattr(obj, '__jsonencode__'):
return obj.__jsonencode__()
if isinstance(obj, set):
return list(obj)
return obj.__dict__
# Here you construct your way to dump your data for each instance
# you need to customize this function
def deserialize(data):
bodies = [Body(d["name"],d["mass"],np.array(d["p"]),np.array(d["v"])) for d in data["bodies"]]
axis_range = data["axis_range"]
timescale = data["timescale"]
return bodies, axis_range, timescale
# Here you construct your way to load your data for each instance
# you need to customize this function
def serialize(data):
file = open(FILE_NAME, 'w+')
json.dump(data, file, cls=BodyEncoder, indent=4)
print("Dumping Parameters of the Latest Run")
print(json.dumps(data, cls=BodyEncoder, indent=4))
Here is an example of the class I want to serialize:
这是我要序列化的类的示例:
class Body(object):
# you do not need to change your class structure
def __init__(self, name, mass, p, v=(0.0, 0.0, 0.0)):
# init variables like normal
self.name = name
self.mass = mass
self.p = p
self.v = v
self.f = np.array([0.0, 0.0, 0.0])
def attraction(self, other):
# not important functions that I wrote...
Here is how to serialize:
以下是序列化的方法:
# you need to customize this function
def serialize_everything():
bodies, axis_range, timescale = generate_data_to_serialize()
data = {"bodies": bodies, "axis_range": axis_range, "timescale": timescale}
BodyEncoder.serialize(data)
Here is how to dump:
以下是转储方法:
def dump_everything():
data = json.loads(open(FILE_NAME, "r").read())
return BodyEncoder.deserialize(data)

