python 在python中读取二进制文件

Question

提问by Vipin

I wrote a python script to create a binary file of integers.

我写了一个 python 脚本来创建一个整数二进制文件。

import struct  
pos = [7623, 3015, 3231, 3829]  
inh = open('test.bin', 'wb')  
for e in pos:  
    inh.write(struct.pack('i', e))  
inh.close()

It worked well, then I tried to read the 'test.bin' file using the below code.

它运行良好，然后我尝试使用以下代码读取“test.bin”文件。

import struct  
inh = open('test.bin', 'rb')  
for rec in inh:  
    pos = struct.unpack('i', rec)  
    print pos  
inh.close()

But it failed with an error message:

但它失败并显示错误消息：

Traceback (most recent call last):   
   File "readbinary.py", line 10, in <module>  
   pos = struct.unpack('i', rec)  
   File "/usr/lib/python2.5/struct.py", line 87, in unpack  
   return o.unpack(s)  
struct.error: unpack requires a string argument of length 4

I would like to know how I can read these file using struct.unpack.
Many thanks in advance, Vipin

我想知道如何使用struct.unpack.
非常感谢， Vipin

Answer 1

回答by Alex Martelli

for rec in inh:reads one lineat a time -- not what you want for a binaryfile. Read 4 bytes at a time (with a whileloop and inh.read(4)) instead (or read everything into memory with a single .read()call, then unpack successive 4-byte slices). The second approach is simplest and most practical as long as the amount of data involved isn't huge:

for rec in inh:一次读取一行——不是你想要的二进制文件。一次读取 4 个字节（使用while循环和inh.read(4)）（或通过一次.read()调用将所有内容读入内存，然后解压连续的 4 字节切片）。第二种方法最简单也最实用，只要涉及的数据量不大：

import struct
with open('test.bin', 'rb') as inh:
    indata = inh.read()
for i in range(0, len(data), 4):
    pos = struct.unpack('i', data[i:i+4])  
    print(pos)

If you do fear potentially huge amounts of data (which would take more memory than you have available), a simple generator offers an elegant alternative:

如果您确实担心潜在的大量数据（这会占用比可用内存更多的内存），一个简单的生成器提供了一个优雅的替代方案：

import struct
def by4(f):
    rec = 'x'  # placeholder for the `while`
    while rec:
        rec = f.read(4)
        if rec: yield rec           
with open('test.bin', 'rb') as inh:
    for rec in by4(inh):
        pos = struct.unpack('i', rec)  
        print(pos)

A key advantage to this second approach is that the by4generator can easily be tweaked (while maintaining the specs: return a binary file's data 4 bytes at a time) to use a different implementation strategy for buffering, all the way to the first approach (read everything then parcel it out) which can be seen as "infinite buffering" and coded:

第二种方法的一个关键优点是by4可以轻松调整生成器（同时保持规范：一次返回一个二进制文件的数据 4 个字节）以使用不同的实现策略进行缓冲，一直到第一种方法（阅读然后将所有内容打包），可以将其视为“无限缓冲”并进行编码：

def by4(f):
    data = inf.read()
    for i in range(0, len(data), 4):
        yield data[i:i+4]

while leaving the "application logic" (what to dowith that stream of 4-byte chunks) intact and independent of the I/O layer (which gets encapsulated within the generator).

同时留下了“应用逻辑”（什么做具有4个字节的组块流）完整和独立的I / O层（其被所述发电机内封装的）。

Answer 2

回答by ondra

I think "for rec in inh" is supposed to read 'lines', not bytes. What you want is:

我认为“for rec inh”应该读取“行”，而不是字节。你想要的是：

while True:
    rec = inh.read(4) # Or inh.read(struct.calcsize('i'))
    if len(rec) != 4:
        break
    (pos,) = struct.unpack('i', rec)
    print pos

Or as others have mentioned:

或者正如其他人提到的：

while True:
    try:
        (pos,) = struct.unpack_from('i', inh)
    except (some_exception...):
        break

Answer 3

回答by gimel

Check the size of the packed integers:

检查压缩整数的大小：

>>> pos
[7623, 3015, 3231, 3829]
>>> [struct.pack('i',e) for e in pos]
['\xc7\x1d\x00\x00', '\xc7\x0b\x00\x00', '\x9f\x0c\x00\x00', '\xf5\x0e\x00\x00']

We see 4-byte strings, it means that reading should be 4 bytes at a time:

我们看到 4 字节的字符串，这意味着一次读取应该是 4 个字节：

>>> inh=open('test.bin','rb')
>>> b1=inh.read(4)
>>> b1
'\xc7\x1d\x00\x00'
>>> struct.unpack('i',b1)
(7623,)
>>>

This is the original int! Extending into a reading loop is left as an exercise .

这是原来的int！扩展到阅读循环作为练习。

Answer 4

回答by u0b34a0f6ae

You can probably use arrayas well if you want:

array如果需要，您也可以使用：

import array  
pos = array.array('i', [7623, 3015, 3231, 3829]) 
inh = open('test.bin', 'wb')  
pos.write(inh)
inh.close()

Then use array.array.fromfileor fromstringto read it back.

然后使用array.array.fromfile或fromstring将其读回。

Answer 5

回答by roff

This function reads all bytes from file

此函数从文件中读取所有字节

def read_binary_file(filename):
try:
    f = open(filename, 'rb')
    n = os.path.getsize(filename)
    data = array.array('B')
    data.read(f, n)
    f.close()
    fsize = data.__len__()
    return (fsize, data)

except IOError:
    return (-1, [])

# somewhere in your code
t = read_binary_file(FILENAME)
fsize = t[0]

if (fsize > 0):
    data = t[1]
    # work with data
else:
    print 'Error reading file'

Answer 6

回答by Xorlev

Your iterator isn't reading 4 bytes at a time so I imagine it's rather confused. Like SilentGhost mentioned, it'd probably be best to use unpack_from().

您的迭代器一次不是读取 4 个字节，所以我想它相当混乱。就像 SilentGhost 提到的那样，最好使用 unpack_from()。

python 在python中读取二进制文件

提问by Vipin

回答by Alex Martelli

回答by ondra

回答by gimel

回答by u0b34a0f6ae

回答by roff

回答by Xorlev

相关推荐

最近更新

标签

python 在python中读取二进制文件

提问by Vipin

回答by Alex Martelli

回答by ondra

回答by gimel

回答by u0b34a0f6ae

回答by roff

回答by Xorlev

相关推荐

python 在 Django 开发期间提供静态媒体：为什么不 MEDIA_ROOT？

python 减去两个日期得到一个时间增量

python 如何在python中发送xml-rpc请求？

python 使用 XPath 获取特定属性值

相关推荐

最近更新

标签