如何使用 Python 从二进制文件中解压缩字节数组?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16512284/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:52:36  来源:igfitidea点击:

How to unpack from a binary file a byte array using Python?

pythonpython-3.x

提问by Darren Beale

I'm giving myself a crash course in reading a binary file using Python. I'm new to both, so please bear with me.

我正在给自己上一门使用 Python 读取二进制文件的速成课程。我对两者都是新手,所以请耐心等待。

The file format's documentation tells me that the first 16 bytes are a GUID and further reading tells me that this GUID is formatted thus:

文件格式的文档告诉我前 16 个字节是一个 GUID,进一步阅读告诉我这个 GUID 的格式如下:

typedef struct {
  unsigned long Data1;
  unsigned short Data2;
  unsigned short Data3;
  byte Data4[8];
} GUID, 
 UUID, 
 *PGUID;

I've got as far us being able to unpack the first three entries in the struct, but I'm getting stumped on #4. It's an array of 8 bytes I think but I'm not sure how to unpack it.

我已经能够解压缩结构中的前三个条目,但我被#4难住了。我认为这是一个 8 字节的数组,但我不确定如何解压缩它。

import struct

fp = open("./file.bin", mode='rb')

Data1 = struct.unpack('<L', fp.read(4)) # unsigned long, little-endian
Data2 = struct.unpack('<H', fp.read(2)) # unsigned short, little-endian 
Data3 = struct.unpack('<H', fp.read(2)) # unsigned short, little-endian
Data4 = struct.unpack('<s', bytearray(fp.read(8))) # byte array with 8 entries?

struct.error: unpack requires a bytes object of length 1

What am I doing wrong for Data4? (I'm using Python 3.2 BTW)

我对 Data4 做错了什么?(我正在使用 Python 3.2 顺便说一句)

Data1 thru 3 are OK. If I use hex() on them I am getting the correct data that I'd expect to see (woohoo) I'm just failing over on the syntax of this byte array.

数据 1 到 3 没问题。如果我在它们上使用 hex() 我会得到我希望看到的正确数据(woohoo)我只是在这个字节数组的语法上失败了。

Edit: Answer

编辑:回答

I'm reading a GUID as defined in MS-DTYP and this nailed it:

我正在阅读 MS-DTYP 中定义的 GUID,这把它搞定了:

data = uuid.UUID(bytes_le=fp.read(16))

采纳答案by abarnert

If you want an 8-byte string, you need to put the number 8in there:

如果你想要一个 8 字节的字符串,你需要把数字8放在那里:

struct.unpack('<8s', bytearray(fp.read(8)))

From the docs:

文档

A format character may be preceded by an integral repeat count. For example, the format string '4h' means exactly the same as 'hhhh'.

格式字符前面可能有一个整数重复计数。例如,格式字符串 '4h' 的含义与 'hhhh' 完全相同。

For the 's' format character, the count is interpreted as the length of the bytes, not a repeat count like for the other format characters; for example, '10s' means a single 10-byte string, while '10c' means 10 characters. If a count is not given, it defaults to 1. For packing, the string is truncated or padded with null bytes as appropriate to make it fit. For unpacking, the resulting bytes object always has exactly the specified number of bytes. As a special case, '0s' means a single, empty string (while '0c' means 0 characters).

对于 's' 格式字符,计数被解释为字节的长度,而不是像其他格式字符那样的重复计数;例如,'10s' 表示单个 10 字节字符串,而 '10c' 表示 10 个字符。如果未给出计数,则默认为 1。对于打包,字符串会被截断或适当填充空字节以使其适合。对于解包,生成的字节对象始终具有指定的字节数。作为特殊情况,'0s' 表示单个空字符串(而 '0c' 表示 0 个字符)。



However, I'm not sure why you're doing this in the first place.

但是,我不确定您首先为什么要这样做。

fp.read(8)gives you an 8-byte bytesobject. You want an 8-byte bytesobject. So, just do this:

fp.read(8)给你一个 8 字节的bytes对象。您需要一个 8 字节的bytes对象。所以,只需这样做:

Data4 = fp.read(8)

Converting the bytesto a bytearrayhas no effect except to make a mutable copy. Unpacking it just gives you back a copy of the same bytesyou started with. So…?why?

除了制作可变副本外,将 转换bytes为 abytearray没有任何效果。打开它只是给你一个bytes你开始时的副本。所以为什么?



Well, actually, struct.unpackreturns a tuplewhose one value is a copy of the same bytesyou started with, but you can do that with:

好吧,实际上,struct.unpack返回 atuple其一个值是bytes您开始时相同的副本,但您可以这样做:

Data4 = (fp.read(8),)

Which raises the question of why you want four single-element tuples in the first place. You're going to be doing Data1[0], etc. all over the place for no good reason. Why not this?

这就提出了一个问题,为什么你首先需要四个单元素元组。你会Data1[0]无缘无故地到处做,等等。为什么不是这个?

Data1, Data2, Data3, Data4 = struct.unpack('<LHH8s', fp.read(16))


Of course if this is meant to read a UUID, it's always better to use the "batteries included" than to try to build your own batteries from nickel and cadmium ore. As icktoofay says, just use the uuidmodule:

当然,如果这是为了读取 UUID,那么使用“包含的电池”总是比尝试用镍和镉矿制造自己的电池要好。正如 icktoofay 所说,只需使用uuid模块:

data = uuid.UUID(bytes_le=fp.read(16))

But keep in mind that Python's uuiduses the 4-2-2-1-1-6 format, not the 4-2-2-8 format. If you really need exactly that format, you'll need to convert it, which means either structor bit twiddling anyway. (Microsoft's GUID makes things even more fun by using a 4-2-2-2-6 format, which is not the same as either, and representing the first 3 in native-endian and the last two in big-endian, because they like to make things easier…)

但请记住,Pythonuuid使用 4-2-2-1-1-6 格式,而不是 4-2-2-8 格式。如果你真的需要那种格式,你需要转换它,这意味着struct无论如何都要摆弄。(Microsoft 的 GUID 通过使用 4-2-2-2-6 格式使事情变得更加有趣,这与两者都不相同,并表示本机端序的前 3 个和大端序的后两个,因为它们喜欢让事情变得更简单……)

回答by icktoofay

UUIDsare supported by Python with the uuidmodule. Do something like this:

UUID的是通过用Python支持uuid模块。做这样的事情:

import uuid

my_uuid = uuid.UUID(bytes_le=fp.read(16))