Python 如何将多维数组写入文本文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3685265/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to write a multidimensional array to a text file?
提问by Ivo Flipse
In another question, other users offered some help if I could supply the array I was having trouble with. However, I even fail at a basic I/O task, such as writing an array to a file.
在另一个问题中,如果我可以提供遇到问题的阵列,其他用户会提供一些帮助。但是,我什至无法完成基本的 I/O 任务,例如将数组写入文件。
Can anyone explain what kind of loop I would need to write a 4x11x14 numpy array to file?
谁能解释我需要将一个 4x11x14 的 numpy 数组写入文件需要什么样的循环?
This array consist of four 11 x 14 arrays, so I should format it with a nice newline, to make the reading of the file easier on others.
这个数组由四个 11 x 14 数组组成,所以我应该用一个漂亮的换行符来格式化它,以便其他人更容易阅读文件。
Edit: So I've tried the numpy.savetxt function. Strangely, it gives the following error:
编辑:所以我尝试了 numpy.savetxt 函数。奇怪的是,它给出了以下错误:
TypeError: float argument required, not numpy.ndarray
I assume that this is because the function doesn't work with multidimensional arrays? Any solutions as I would like them within one file?
我认为这是因为该函数不适用于多维数组?我希望在一个文件中提供任何解决方案?
采纳答案by Joe Kington
If you want to write it to disk so that it will be easy to read back in as a numpy array, look into numpy.save. Pickling it will work fine, as well, but it's less efficient for large arrays (which yours isn't, so either is perfectly fine).
如果你想把它写到磁盘以便它可以很容易地作为一个 numpy 数组读回,请查看numpy.save. 酸洗它也可以正常工作,但是对于大型阵列(您的不是,所以两者都很好)效率较低。
If you want it to be human readable, look into numpy.savetxt.
如果您希望它是人类可读的,请查看numpy.savetxt.
Edit:So, it seems like savetxtisn't quite as great an option for arrays with >2 dimensions... But just to draw everything out to it's full conclusion:
编辑:所以,savetxt对于具有 > 2 维的数组来说,这似乎不是一个很好的选择......但只是为了将所有内容都得出完整的结论:
I just realized that numpy.savetxtchokes on ndarrays with more than 2 dimensions... This is probably by design, as there's no inherently defined way to indicate additional dimensions in a text file.
我刚刚意识到numpy.savetxt超过 2 维的 ndarrays 会阻塞......这可能是设计使然,因为没有固有定义的方式来指示文本文件中的其他维度。
E.g. This (a 2D array) works fine
例如这(一个二维数组)工作正常
import numpy as np
x = np.arange(20).reshape((4,5))
np.savetxt('test.txt', x)
While the same thing would fail (with a rather uninformative error: TypeError: float argument required, not numpy.ndarray) for a 3D array:
虽然TypeError: float argument required, not numpy.ndarray对于 3D 数组,同样的事情会失败(带有相当无信息的错误:):
import numpy as np
x = np.arange(200).reshape((4,5,10))
np.savetxt('test.txt', x)
One workaround is just to break the 3D (or greater) array into 2D slices. E.g.
一种解决方法是将 3D(或更大)阵列分解为 2D 切片。例如
x = np.arange(200).reshape((4,5,10))
with file('test.txt', 'w') as outfile:
for slice_2d in x:
np.savetxt(outfile, slice_2d)
However, our goal is to be clearly human readable, while still being easily read back in with numpy.loadtxt. Therefore, we can be a bit more verbose, and differentiate the slices using commented out lines. By default, numpy.loadtxtwill ignore any lines that start with #(or whichever character is specified by the commentskwarg). (This looks more verbose than it actually is...)
但是,我们的目标是清晰易读,同时仍然可以轻松地用numpy.loadtxt. 因此,我们可以更详细一些,并使用注释掉的行来区分切片。默认情况下,numpy.loadtxt将忽略以#(或commentskwarg指定的任何字符)开头的任何行。(这看起来比实际更冗长......)
import numpy as np
# Generate some test data
data = np.arange(200).reshape((4,5,10))
# Write the array to disk
with open('test.txt', 'w') as outfile:
# I'm writing a header here just for the sake of readability
# Any line starting with "#" will be ignored by numpy.loadtxt
outfile.write('# Array shape: {0}\n'.format(data.shape))
# Iterating through a ndimensional array produces slices along
# the last axis. This is equivalent to data[i,:,:] in this case
for data_slice in data:
# The formatting string indicates that I'm writing out
# the values in left-justified columns 7 characters in width
# with 2 decimal places.
np.savetxt(outfile, data_slice, fmt='%-7.2f')
# Writing out a break to indicate different slices...
outfile.write('# New slice\n')
This yields:
这产生:
# Array shape: (4, 5, 10)
0.00 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00
10.00 11.00 12.00 13.00 14.00 15.00 16.00 17.00 18.00 19.00
20.00 21.00 22.00 23.00 24.00 25.00 26.00 27.00 28.00 29.00
30.00 31.00 32.00 33.00 34.00 35.00 36.00 37.00 38.00 39.00
40.00 41.00 42.00 43.00 44.00 45.00 46.00 47.00 48.00 49.00
# New slice
50.00 51.00 52.00 53.00 54.00 55.00 56.00 57.00 58.00 59.00
60.00 61.00 62.00 63.00 64.00 65.00 66.00 67.00 68.00 69.00
70.00 71.00 72.00 73.00 74.00 75.00 76.00 77.00 78.00 79.00
80.00 81.00 82.00 83.00 84.00 85.00 86.00 87.00 88.00 89.00
90.00 91.00 92.00 93.00 94.00 95.00 96.00 97.00 98.00 99.00
# New slice
100.00 101.00 102.00 103.00 104.00 105.00 106.00 107.00 108.00 109.00
110.00 111.00 112.00 113.00 114.00 115.00 116.00 117.00 118.00 119.00
120.00 121.00 122.00 123.00 124.00 125.00 126.00 127.00 128.00 129.00
130.00 131.00 132.00 133.00 134.00 135.00 136.00 137.00 138.00 139.00
140.00 141.00 142.00 143.00 144.00 145.00 146.00 147.00 148.00 149.00
# New slice
150.00 151.00 152.00 153.00 154.00 155.00 156.00 157.00 158.00 159.00
160.00 161.00 162.00 163.00 164.00 165.00 166.00 167.00 168.00 169.00
170.00 171.00 172.00 173.00 174.00 175.00 176.00 177.00 178.00 179.00
180.00 181.00 182.00 183.00 184.00 185.00 186.00 187.00 188.00 189.00
190.00 191.00 192.00 193.00 194.00 195.00 196.00 197.00 198.00 199.00
# New slice
Reading it back in is very easy, as long as we know the shape of the original array. We can just do numpy.loadtxt('test.txt').reshape((4,5,10)). As an example (You can do this in one line, I'm just being verbose to clarify things):
只要我们知道原始数组的形状,读回它就很容易。我们只能做numpy.loadtxt('test.txt').reshape((4,5,10))。举个例子(你可以在一行中做到这一点,我只是为了澄清事情而冗长):
# Read the array from disk
new_data = np.loadtxt('test.txt')
# Note that this returned a 2D array!
print new_data.shape
# However, going back to 3D is easy if we know the
# original shape of the array
new_data = new_data.reshape((4,5,10))
# Just to check that they're the same...
assert np.all(new_data == data)
回答by Dominic Rodger
I am not certain if this meets your requirements, given I think you are interested in making the file readable by people, but if that's not a primary concern, just pickleit.
我不确定这是否满足您的要求,因为我认为您有兴趣使人们可以阅读该文件,但如果这不是主要问题,pickle那就只是它。
To save it:
要保存它:
import pickle
my_data = {'a': [1, 2.0, 3, 4+6j],
'b': ('string', u'Unicode string'),
'c': None}
output = open('data.pkl', 'wb')
pickle.dump(my_data, output)
output.close()
To read it back:
读回来:
import pprint, pickle
pkl_file = open('data.pkl', 'rb')
data1 = pickle.load(pkl_file)
pprint.pprint(data1)
pkl_file.close()
回答by jwueller
You can simply traverse the array in three nested loops and write their values to your file. For reading, you simply use the same exact loop construction. You will get the values in exactly the right order to fill your arrays correctly again.
您可以简单地在三个嵌套循环中遍历数组并将它们的值写入您的文件。对于阅读,您只需使用相同的精确循环结构。您将以完全正确的顺序获取值,以便再次正确填充数组。
回答by Ronny Brendel
There exist special libraries to do just that. (Plus wrappers for python)
有专门的库可以做到这一点。(加上python的包装器)
- netCDF4: http://www.unidata.ucar.edu/software/netcdf/
netCDF4 Python interface: http://www.unidata.ucar.edu/software/netcdf/software.html#Python
- netCDF4:http: //www.unidata.ucar.edu/software/netcdf/
netCDF4 Python 接口:http: //www.unidata.ucar.edu/software/netcdf/software.html#Python
HDF5:http://www.hdfgroup.org/HDF5/
hope this helps
希望这可以帮助
回答by aseagram
If you don't need a human-readable output, another option you could try is to save the array as a MATLAB .matfile, which is a structured array. I despise MATLAB, but the fact that I can both read and write a .matin very few lines is convenient.
如果您不需要人类可读的输出,您可以尝试的另一个选项是将数组保存为 MATLAB.mat文件,这是一个结构化数组。我鄙视 MATLAB,但我可以.mat在很少的行中读写 a 的事实很方便。
Unlike Joe Kington's answer, the benefit of this is that you don't need to know the original shape of the datain the .matfile, i.e. no need to reshape upon reading in. And, unlike using pickle, a .matfile can be read by MATLAB, and probably some other programs/languages as well.
与乔金顿的回答,这样做的好处是,你不需要知道数据的原始形状中.mat的文件,即在阅读无需重塑。而且,不像使用pickle,一个.mat文件可以通过MATLAB读取,可能还有其他一些程序/语言。
Here is an example:
下面是一个例子:
import numpy as np
import scipy.io
# Some test data
x = np.arange(200).reshape((4,5,10))
# Specify the filename of the .mat file
matfile = 'test_mat.mat'
# Write the array to the mat file. For this to work, the array must be the value
# corresponding to a key name of your choice in a dictionary
scipy.io.savemat(matfile, mdict={'out': x}, oned_as='row')
# For the above line, I specified the kwarg oned_as since python (2.7 with
# numpy 1.6.1) throws a FutureWarning. Here, this isn't really necessary
# since oned_as is a kwarg for dealing with 1-D arrays.
# Now load in the data from the .mat that was just saved
matdata = scipy.io.loadmat(matfile)
# And just to check if the data is the same:
assert np.all(x == matdata['out'])
If you forget the key that the array is named in the .matfile, you can always do:
如果您忘记了.mat文件中数组命名的键,您可以随时执行以下操作:
print matdata.keys()
And of course you can store many arrays using many more keys.
当然,您可以使用更多键存储许多数组。
So yes – it won't be readable with your eyes, but only takes 2 lines to write and read the data, which I think is a fair trade-off.
所以是的 - 它不会用你的眼睛可读,但只需要 2 行来写入和读取数据,我认为这是一个公平的权衡。
Take a look at the docs for scipy.io.savematand scipy.io.loadmatand also this tutorial page: scipy.io File IO Tutorial
查看scipy.io.savemat和scipy.io.loadmat的文档 以及本教程页面:scipy.io File IO Tutorial
回答by atomh33ls
ndarray.tofile()should also work
ndarray.tofile()也应该工作
e.g. if your array is called a:
例如,如果您的数组被调用a:
a.tofile('yourfile.txt',sep=" ",format="%s")
Not sure how to get newline formatting though.
不知道如何获得换行符格式。
Edit(credit Kevin J. Black's comment here):
编辑(信用凯文 J. 布莱克的评论在这里):
Since version 1.5.0,
np.tofile()takes an optional parameternewline='\n'to allow multi-line output. https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.savetxt.html
从 1.5.0 版开始,
np.tofile()采用可选参数newline='\n'以允许多行输出。 https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.savetxt.html
回答by BennyD
I have a way to do it using a simply filename.write() operation. It works fine for me, but I'm dealing with arrays having ~1500 data elements.
我有一种方法可以使用简单的 filename.write() 操作来做到这一点。它对我来说很好用,但我正在处理具有 ~1500 个数据元素的数组。
I basically just have for loops to iterate through the file and write it to the output destination line-by-line in a csv style output.
我基本上只有 for 循环来遍历文件并将其以 csv 样式输出逐行写入输出目标。
import numpy as np
trial = np.genfromtxt("/extension/file.txt", dtype = str, delimiter = ",")
with open("/extension/file.txt", "w") as f:
for x in xrange(len(trial[:,1])):
for y in range(num_of_columns):
if y < num_of_columns-2:
f.write(trial[x][y] + ",")
elif y == num_of_columns-1:
f.write(trial[x][y])
f.write("\n")
The if and elif statement are used to add commas between the data elements. For whatever reason, these get stripped out when reading the file in as an nd array. My goal was to output the file as a csv, so this method helps to handle that.
if 和 elif 语句用于在数据元素之间添加逗号。无论出于何种原因,当将文件作为 nd 数组读取时,这些都会被删除。我的目标是将文件输出为 csv,因此此方法有助于处理该问题。
Hope this helps!
希望这可以帮助!
回答by Kenpachi Zaraki
Pickle is best for these cases. Suppose you have a ndarray named x_train. You can dump it into a file and revert it back using the following command:
Pickle 最适合这些情况。假设您有一个名为 的 ndarray x_train。您可以将其转储到文件中,然后使用以下命令将其还原:
import pickle
###Load into file
with open("myfile.pkl","wb") as f:
pickle.dump(x_train,f)
###Extract from file
with open("myfile.pkl","rb") as f:
x_temp = pickle.load(f)

