Python 将可变大小的字节数组转换为整数/长整数

Question

提问by goncalopp

How can I convert a (big endian) variable-sized binary byte array to an (unsigned) integer/long? As an example, '\x11\x34', which represents 4404

如何将（大端）可变大小的二进制字节数组转换为（无符号）整数/长整数？例如，'\x11\x34'代表 4404

Right now, I'm using

现在，我正在使用

def bytes_to_int(bytes):
  return int(bytes.encode('hex'), 16)

Which is small and somewhat readable, but probably not very efficient. Is there a better (more obvious) way?

它很小，有些可读，但可能效率不高。有没有更好（更明显）的方法？

Answer 1

采纳答案by abarnert

Python doesn't traditionally have much use for "numbers in big-endian C layout" that are too big for C. (If you're dealing with 2-byte, 4-byte, or 8-byte numbers, then struct.unpackis the answer.)

Python 传统上对 C 来说太大的“大端 C 布局中的数字”没有太多用处。（如果您正在处理 2 字节、4 字节或 8 字节的数字，那么struct.unpack答案是.)

But enough people got sick of there not being one obvious way to do this that Python 3.2 added a method int.from_bytesthat does exactly what you want:

但是有足够多的人厌倦了没有一种明显的方法可以做到这一点，以至于 Python 3.2 添加了一个int.from_bytes完全符合您要求的方法：

int.from_bytes(b, byteorder='big', signed=False)

Unfortunately, if you're using an older version of Python, you don't have this. So, what options do you have? (Besides the obvious one: update to 3.2, or, better, 3.4…)

不幸的是，如果你使用的是旧版本的 Python，你就没有这个。那么，你有哪些选择？（除了显而易见的：更新到 3.2，或者更好的是 3.4……）

First, there's your code. I think binascii.hexlifyis a better way to spell it than .encode('hex'), because "encode" has always seemed a little weird for a method on byte strings (as opposed to Unicode strings), and it's in fact been banished in Python 3. But otherwise, it seems pretty readable and obvious to me. And it should be pretty fast—yes, it has to create an intermediate string, but it's doing all the looping and arithmetic in C (at least in CPython), which is generally an order of magnitude or two faster than in Python. Unless your bytearrayis so big that allocating the string will itself be costly, I wouldn't worry about performance here.

首先，有你的代码。我认为binascii.hexlify是比更好的拼写方法.encode('hex')，因为“编码”对于字节字符串（与 Unicode 字符串相反）上的方法来说似乎总是有点奇怪，实际上它已在 Python 3 中被放逐。但除此之外，它似乎对我来说非常易读且显而易见。它应该非常快——是的，它必须创建一个中间字符串，但它在 C 中执行所有循环和算术（至少在 CPython 中），这通常比在 Python 中快一两个数量级。除非你bytearray太大以至于分配字符串本身会很昂贵，否则我不会担心这里的性能。

Alternatively, you could do it in a loop. But that's going to be more verbose and, at least in CPython, a lot slower.

或者，您可以循环进行。但这会更加冗长，至少在 CPython 中，速度会慢很多。

You could try to eliminate the explicit loop for an implicit one, but the obvious function to do that is reduce, which is considered un-Pythonic by part of the community—and of course it's going to require calling a function for each byte.

您可以尝试消除隐式循环的显式循环，但明显的函数是reduce，社区的一部分认为这是非 Pythonic 的——当然，它需要为每个字节调用一个函数。

You could unroll the loop or reduceby breaking it into chunks of 8 bytes and looping over struct.unpack_from, or by just doing a big struct.unpack('Q'*len(b)//8 + 'B' * len(b)%8)and looping over that, but that makes it a lot less readable and probably not that much faster.

您可以展开循环，或者reduce将其分成 8 个字节的块并循环struct.unpack_from，或者只是做一个大的struct.unpack('Q'*len(b)//8 + 'B' * len(b)%8)循环并在其上循环，但这会降低它的可读性，并且可能不会那么快。

You could use NumPy… but if you're going bigger than either 64 or maybe 128 bits, it's going to end up converting everything to Python objects anyway.

您可以使用 NumPy……但如果您要大于 64 位或 128 位，无论如何它最终都会将所有内容转换为 Python 对象。

So, I think your answer is the best option.

所以，我认为你的答案是最好的选择。

Here are some timings comparing it to the most obvious manual conversion:

以下是将其与最明显的手动转换进行比较的一些时间：

import binascii
import functools
import numpy as np

def hexint(b):
    return int(binascii.hexlify(b), 16)

def loop1(b):
    def f(x, y): return (x<<8)|y
    return functools.reduce(f, b, 0)

def loop2(b):
    x = 0
    for c in b:
        x <<= 8
        x |= c
    return x

def numpily(b):
    n = np.array(list(b))
    p = 1 << np.arange(len(b)-1, -1, -1, dtype=object)
    return np.sum(n * p)

In [226]: b = bytearray(range(256))

In [227]: %timeit hexint(b)
1000000 loops, best of 3: 1.8 μs per loop

In [228]: %timeit loop1(b)
10000 loops, best of 3: 57.7 μs per loop

In [229]: %timeit loop2(b)
10000 loops, best of 3: 46.4 μs per loop

In [283]: %timeit numpily(b)
10000 loops, best of 3: 88.5 μs per loop

For comparison in Python 3.4:

在 Python 3.4 中进行比较：

In [17]: %timeit hexint(b)
1000000 loops, best of 3: 1.69 μs per loop

In [17]: %timeit int.from_bytes(b, byteorder='big', signed=False)
1000000 loops, best of 3: 1.42 μs per loop

So, your method is still pretty fast…

所以，你的方法还是挺快的……

Answer 2

回答by Curd

Function struct.unpack(...)does what you need.

函数struct.unpack(...)做你需要的。

Python 将可变大小的字节数组转换为整数/长整数

提问by goncalopp

采纳答案by abarnert

回答by Curd

相关推荐

最近更新

标签

Python 将可变大小的字节数组转换为整数/长整数

提问by goncalopp

采纳答案by abarnert

回答by Curd

相关推荐

Python 如何按值（DESC）然后按键（ASC）对字典进行排序？

Python 检查列表中的所有值是否都大于某个数字

Python 在 Django 1.8 或更高版本中填充时出现“模型尚未加载”错误

Python numpy.where 中的多个条件

相关推荐

最近更新

标签