使用python中的struct模块打包和解包可变长度数组/字符串

Question

提问by agnsaft

I am trying to get a grip around the packing and unpacking of binary data in Python 3. Its actually not that hard to understand, except one problem:

我试图掌握 Python 3 中二进制数据的打包和解包。它实际上并不难理解，除了一个问题：

what if I have a variable length textstring and want to pack and unpack this in the most elegant manner?

如果我有一个可变长度的文本字符串并且想以最优雅的方式打包和解包它怎么办？

As far as I can tell from the manual I can only unpack fixed size strings directly? In that case, are there any elegant way of getting around this limitation without padding lots and lots of unnecessary zeroes?

据我从手册上看，我只能直接解压缩固定大小的字符串？在这种情况下，是否有任何优雅的方法可以绕过此限制而不填充大量不必要的零？

Answer 1

采纳答案by llasram

The structmodule does only support fixed-length structures. For variable-length strings, your options are either:

该struct模块仅支持固定长度的结构。对于变长字符串，您的选择是：

Dynamically construct your format string (a strwill have to be converted to a bytesbefore passing it to pack()):
```
s = bytes(s, 'utf-8')    # Or other appropriate encoding
struct.pack("I%ds" % (len(s),), len(s), s)
```
Skip structand just use normal string methods to add the string to your pack()-ed output: struct.pack("I", len(s)) + s

动态构造您的格式字符串（在str将其bytes传递给之前，必须将 a 转换为 a pack()）：
```
s = bytes(s, 'utf-8')    # Or other appropriate encoding
struct.pack("I%ds" % (len(s),), len(s), s)
```
跳过struct并使用普通字符串方法将字符串添加到pack()-ed 输出：struct.pack("I", len(s)) + s

For unpacking, you just have to unpack a bit at a time:

对于解包，你只需要一次解包一点：

(i,), data = struct.unpack("I", data[:4]), data[4:]
s, data = data[:i], data[i:]

If you're doing a lot of this, you can always add a helper function which uses calcsizeto do the string slicing:

如果你做了很多这样的事情，你总是可以添加一个帮助函数calcsize来进行字符串切片：

def unpack_helper(fmt, data):
    size = struct.calcsize(fmt)
    return struct.unpack(fmt, data[:size]), data[size:]

Answer 2

回答by duncan.forster

Here's some wrapper functions I wrote which help, they seem to work.

这是我编写的一些包装函数，它们有帮助，它们似乎有效。

Here's the unpacking helper:

这是解包助手：

def unpack_from(fmt, data, offset = 0):
    (byte_order, fmt, args) = (fmt[0], fmt[1:], ()) if fmt and fmt[0] in ('@', '=', '<', '>', '!') else ('@', fmt, ())
    fmt = filter(None, re.sub("p", "\tp\t",  fmt).split('\t'))
    for sub_fmt in fmt:
        if sub_fmt == 'p':
            (str_len,) = struct.unpack_from('B', data, offset)
            sub_fmt = str(str_len + 1) + 'p'
            sub_size = str_len + 1
        else:
            sub_fmt = byte_order + sub_fmt
            sub_size = struct.calcsize(sub_fmt)
        args += struct.unpack_from(sub_fmt, data, offset)
        offset += sub_size
    return args

Here's the packing helper:

这是打包助手：

def pack(fmt, *args):
    (byte_order, fmt, data) = (fmt[0], fmt[1:], '') if fmt and fmt[0] in ('@', '=', '<', '>', '!') else ('@', fmt, '')
    fmt = filter(None, re.sub("p", "\tp\t",  fmt).split('\t'))
    for sub_fmt in fmt:
        if sub_fmt == 'p':
            (sub_args, args) = ((args[0],), args[1:]) if len(args) > 1 else ((args[0],), [])
            sub_fmt = str(len(sub_args[0]) + 1) + 'p'
        else:
            (sub_args, args) = (args[:len(sub_fmt)], args[len(sub_fmt):])
            sub_fmt = byte_order + sub_fmt
        data += struct.pack(sub_fmt, *sub_args)
    return data

Answer 3

回答by Vladimir Talybin

Nice, but can't handle numeric number of fields, such as '6B' for 'BBBBBB'. The solution would be to expand format string in both functions before use. I came up with this:

不错，但无法处理字段的数字数量，例如“BBBBBB”的“6B”。解决方案是在使用前在两个函数中扩展格式字符串。我想出了这个：

def pack(fmt, *args):
  fmt = re.sub('(\d+)([^\ds])', lambda x: x.group(2) * int(x.group(1)), fmt)
  ...

And same for unpack. Maybe not most elegant, but it works :)

和解包一样。也许不是最优雅的，但它有效:)

Answer 4

回答by Victor Sergienko

I've googled up this question and a couple of solutions.

我用谷歌搜索了这个问题和几个解决方案。

construct

构造

An elaborate, flexible solution.

一个精心设计、灵活的解决方案。

Instead of writing imperative code to parse a piece of data, you declaratively define a data structure that describes your data. As this data structure is not code, you can use it in one direction to parse data into Pythonic objects, and in the other direction, convert (“build”) objects into binary data.
The library provides both simple, atomic constructs (such as integers of various sizes), as well as composite ones which allow you form hierarchical structures of increasing complexity. Construct features bit and byte granularity, easy debugging and testing, an easy-to-extend subclass system, and lots of primitive constructs to make your work easier:

您无需编写命令式代码来解析一段数据，而是以声明方式定义一个数据结构来描述您的数据。由于此数据结构不是代码，因此您可以在一个方向上使用它将数据解析为 Pythonic 对象，在另一个方向上将（“构建”）对象转换为二进制数据。
该库提供了简单的原子结构（例如各种大小的整数），以及允许您形成复杂性不断增加的分层结构的复合结构。Construct 具有位和字节粒度、易于调试和测试、易于扩展的子类系统以及许多原始结构，使您的工作更轻松：

from construct import *

PascalString = Struct("PascalString",
    UBInt8("length"),
    Bytes("data", lambda ctx: ctx.length),
)

>>> PascalString.parse("\x05helloXXX")
Container({'length': 5, 'data': 'hello'})
>>> PascalString.build(Container(length = 6, data = "foobar"))
'\x06foobar'


PascalString2 = ExprAdapter(PascalString,
    encoder = lambda obj, ctx: Container(length = len(obj), data = obj),
    decoder = lambda obj, ctx: obj.data
)

>>> PascalString2.parse("\x05hello")
'hello'
>>> PascalString2.build("i'm a long string")
"\x11i'm a long string"

netstruct

网络结构

A quick solution if you only need a structextension for variable length byte sequences. Nesting a variable-length structure can be achieved by packing the first packresults.

如果您只需要struct可变长度字节序列的扩展，这是一个快速解决方案。嵌套可变长度结构可以通过packing 第一个pack结果来实现。

NetStruct supports a new formatting character, the dollar sign ($). The dollar sign represents a variable-length string, encoded with its length preceeding the string itself.

NetStruct 支持新的格式化字符，美元符号 ($)。美元符号表示可变长度的字符串，其长度在字符串本身之前进行编码。

edit: Looks like the length of a variable-length string uses the same data type as the elements. Thus, the maximum length of variable-length string of bytes is 255, if words - 65535, and so on.

编辑：看起来可变长度字符串的长度使用与元素相同的数据类型。因此，可变长度字节串的最大长度为 255，如果字数为 65535，依此类推。

import netstruct
>>> netstruct.pack(b"b$", b"Hello World!")
b'\x0cHello World!'

>>> netstruct.unpack(b"b$", b"\x0cHello World!")
[b'Hello World!']

Answer 5

回答by locus2k

An easy way that I was able to do a variable length when packing a string is:

在打包字符串时，我能够实现可变长度的一种简单方法是：

pack('{}s'.format(len(string)), string)

when unpacking it is kind of the same way

打开包装的时候也是一样的

unpack('{}s'.format(len(data)), data)

Answer 6

回答by Santosh Kale

To pack use

打包使用

packed=bytes('sample string','utf-8')

To unpack use

开箱使用

string=str(packed)[2:][:-1]

This works only on utf-8 string and quite simple workaround.

这仅适用于 utf-8 字符串和非常简单的解决方法。