python 在python中读取二进制文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2274503/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading binary file in python
提问by Vipin
I wrote a python script to create a binary file of integers.
我写了一个 python 脚本来创建一个整数二进制文件。
import struct
pos = [7623, 3015, 3231, 3829]
inh = open('test.bin', 'wb')
for e in pos:
inh.write(struct.pack('i', e))
inh.close()
It worked well, then I tried to read the 'test.bin' file using the below code.
它运行良好,然后我尝试使用以下代码读取“test.bin”文件。
import struct
inh = open('test.bin', 'rb')
for rec in inh:
pos = struct.unpack('i', rec)
print pos
inh.close()
But it failed with an error message:
但它失败并显示错误消息:
Traceback (most recent call last):
File "readbinary.py", line 10, in <module>
pos = struct.unpack('i', rec)
File "/usr/lib/python2.5/struct.py", line 87, in unpack
return o.unpack(s)
struct.error: unpack requires a string argument of length 4
I would like to know how I can read these file using struct.unpack
.
Many thanks in advance,
Vipin
我想知道如何使用struct.unpack
.
非常感谢, Vipin
回答by Alex Martelli
for rec in inh:
reads one lineat a time -- not what you want for a binaryfile. Read 4 bytes at a time (with a while
loop and inh.read(4)
) instead (or read everything into memory with a single .read()
call, then unpack successive 4-byte slices). The second approach is simplest and most practical as long as the amount of data involved isn't huge:
for rec in inh:
一次读取一行——不是你想要的二进制文件。一次读取 4 个字节(使用while
循环和inh.read(4)
)(或通过一次.read()
调用将所有内容读入内存,然后解压连续的 4 字节切片)。第二种方法最简单也最实用,只要涉及的数据量不大:
import struct
with open('test.bin', 'rb') as inh:
indata = inh.read()
for i in range(0, len(data), 4):
pos = struct.unpack('i', data[i:i+4])
print(pos)
If you do fear potentially huge amounts of data (which would take more memory than you have available), a simple generator offers an elegant alternative:
如果您确实担心潜在的大量数据(这会占用比可用内存更多的内存),一个简单的生成器提供了一个优雅的替代方案:
import struct
def by4(f):
rec = 'x' # placeholder for the `while`
while rec:
rec = f.read(4)
if rec: yield rec
with open('test.bin', 'rb') as inh:
for rec in by4(inh):
pos = struct.unpack('i', rec)
print(pos)
A key advantage to this second approach is that the by4
generator can easily be tweaked (while maintaining the specs: return a binary file's data 4 bytes at a time) to use a different implementation strategy for buffering, all the way to the first approach (read everything then parcel it out) which can be seen as "infinite buffering" and coded:
第二种方法的一个关键优点是by4
可以轻松调整生成器(同时保持规范:一次返回一个二进制文件的数据 4 个字节)以使用不同的实现策略进行缓冲,一直到第一种方法(阅读然后将所有内容打包),可以将其视为“无限缓冲”并进行编码:
def by4(f):
data = inf.read()
for i in range(0, len(data), 4):
yield data[i:i+4]
while leaving the "application logic" (what to dowith that stream of 4-byte chunks) intact and independent of the I/O layer (which gets encapsulated within the generator).
同时留下了“应用逻辑”(什么做具有4个字节的组块流)完整和独立的I / O层(其被所述发电机内封装的)。
回答by ondra
I think "for rec in inh" is supposed to read 'lines', not bytes. What you want is:
我认为“for rec inh”应该读取“行”,而不是字节。你想要的是:
while True:
rec = inh.read(4) # Or inh.read(struct.calcsize('i'))
if len(rec) != 4:
break
(pos,) = struct.unpack('i', rec)
print pos
Or as others have mentioned:
或者正如其他人提到的:
while True:
try:
(pos,) = struct.unpack_from('i', inh)
except (some_exception...):
break
回答by gimel
Check the size of the packed integers:
检查压缩整数的大小:
>>> pos
[7623, 3015, 3231, 3829]
>>> [struct.pack('i',e) for e in pos]
['\xc7\x1d\x00\x00', '\xc7\x0b\x00\x00', '\x9f\x0c\x00\x00', '\xf5\x0e\x00\x00']
We see 4-byte strings, it means that reading should be 4 bytes at a time:
我们看到 4 字节的字符串,这意味着一次读取应该是 4 个字节:
>>> inh=open('test.bin','rb')
>>> b1=inh.read(4)
>>> b1
'\xc7\x1d\x00\x00'
>>> struct.unpack('i',b1)
(7623,)
>>>
This is the original int! Extending into a reading loop is left as an exercise .
这是原来的int!扩展到阅读循环作为练习。
回答by u0b34a0f6ae
You can probably use array
as well if you want:
array
如果需要,您也可以使用:
import array
pos = array.array('i', [7623, 3015, 3231, 3829])
inh = open('test.bin', 'wb')
pos.write(inh)
inh.close()
Then use array.array.fromfile
or fromstring
to read it back.
然后使用array.array.fromfile
或fromstring
将其读回。
回答by roff
This function reads all bytes from file
此函数从文件中读取所有字节
def read_binary_file(filename):
try:
f = open(filename, 'rb')
n = os.path.getsize(filename)
data = array.array('B')
data.read(f, n)
f.close()
fsize = data.__len__()
return (fsize, data)
except IOError:
return (-1, [])
# somewhere in your code
t = read_binary_file(FILENAME)
fsize = t[0]
if (fsize > 0):
data = t[1]
# work with data
else:
print 'Error reading file'
回答by Xorlev
Your iterator isn't reading 4 bytes at a time so I imagine it's rather confused. Like SilentGhost mentioned, it'd probably be best to use unpack_from().
您的迭代器一次不是读取 4 个字节,所以我想它相当混乱。就像 SilentGhost 提到的那样,最好使用 unpack_from()。