在 Python 3 中迭代单个字节
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14267452/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Iterate over individual bytes in Python 3
提问by flying sheep
When iterating over a bytesobject in Python 3, one gets the individual bytesas ints:
当迭代一个bytes在Python 3的对象,可以得到个人bytes为ints:
>>> [b for b in b'123']
[49, 50, 51]
How to get 1-length bytesobjects instead?
如何获得长度为 1 的bytes对象?
The following is possible, but not very obvious for the reader and most likely performs bad:
以下是可能的,但对读者来说不是很明显,并且很可能表现不佳:
>>> [bytes([b]) for b in b'123']
[b'1', b'2', b'3']
采纳答案by jfs
If you are concerned about performance of this code and an intas a byte is not suitable interface in your case then you should probably reconsider data structures that you use e.g., use strobjects instead.
如果您担心此代码的性能并且intas a byte 在您的情况下不适合接口,那么您可能应该重新考虑您使用的数据结构,例如,改用str对象。
You could slice the bytesobject to get 1-length bytesobjects:
您可以将bytes对象切片以获得 1 长度的bytes对象:
L = [bytes_obj[i:i+1] for i in range(len(bytes_obj))]
There is PEP 0467 -- Minor API improvements for binary sequencesthat proposes bytes.iterbytes()method:
有PEP 0467 --提出bytes.iterbytes()方法的二进制序列的次要 API 改进:
>>> list(b'123'.iterbytes())
[b'1', b'2', b'3']
回答by guettli
I use this helper method:
我使用这个辅助方法:
def iter_bytes(my_bytes):
for i in range(len(my_bytes)):
yield my_bytes[i:i+1]
Works for Python2 and Python3.
适用于 Python2 和 Python3。
回答by Leon
A trio of map(), bytes()and zip()does the trick:
三个map(),bytes()并且zip()可以解决问题:
>>> list(map(bytes, zip(b'123')))
[b'1', b'2', b'3']
However I don't think that it is any more readable than [bytes([b]) for b in b'123']or performs better.
但是,我不认为它比它更具可读性[bytes([b]) for b in b'123']或性能更好。
回答by user38
A short way to do this:
一个简单的方法来做到这一点:
[chr(i).encode() for i in b'123']
回答by snakecharmerb
int.to_bytes
int.to_bytes
intobjects have a to_bytesmethod which can be used to convert an int to its corresponding byte:
int对象有一个to_bytes方法,可用于将 int 转换为其相应的字节:
>>> import sys
>>> [i.to_bytes(1, sys.byteorder) for i in b'123']
[b'1', b'2', b'3']
As with some other other answers, it's not clear that this is more readable than the OP's original solution: the length and byteorder arguments make it noisier I think.
与其他一些答案一样,尚不清楚这是否比 OP 的原始解决方案更具可读性:我认为长度和字节顺序参数使它变得更加嘈杂。
struct.unpack
结构体解压
Another approach would be to use struct.unpack, though this might also be considered difficult to read, unless you are familiar with the struct module:
另一种方法是使用struct.unpack,尽管这也可能被认为难以阅读,除非您熟悉 struct 模块:
>>> import struct
>>> struct.unpack('3c', b'123')
(b'1', b'2', b'3')
(As jfs observes in the comments, the format string for struct.unpackcan be constructed dynamically; in this case we know the number of individual bytes in the result must equal the number of bytes in the original bytestring, so struct.unpack(str(len(bytestring)) + 'c', bytestring)is possible.)
(正如 jfs 在注释中所观察到的,struct.unpack可以动态构造格式字符串;在这种情况下,我们知道结果中的单个字节数必须等于原始字节串中的字节数,所以struct.unpack(str(len(bytestring)) + 'c', bytestring)是可能的。)
Performance
表现
>>> import random, timeit
>>> bs = bytes(random.randint(0, 255) for i in range(100))
>>> # OP's solution
>>> timeit.timeit(setup="from __main__ import bs",
stmt="[bytes([b]) for b in bs]")
46.49886950897053
>>> # Accepted answer from jfs
>>> timeit.timeit(setup="from __main__ import bs",
stmt="[bs[i:i+1] for i in range(len(bs))]")
20.91463226894848
>>> # Leon's answer
>>> timeit.timeit(setup="from __main__ import bs",
stmt="list(map(bytes, zip(bs)))")
27.476876026019454
>>> # guettli's answer
>>> timeit.timeit(setup="from __main__ import iter_bytes, bs",
stmt="list(iter_bytes(bs))")
24.107485140906647
>>> # user38's answer (with Leon's suggested fix)
>>> timeit.timeit(setup="from __main__ import bs",
stmt="[chr(i).encode('latin-1') for i in bs]")
45.937552741961554
>>> # Using int.to_bytes
>>> timeit.timeit(setup="from __main__ import bs;from sys import byteorder",
stmt="[x.to_bytes(1, byteorder) for x in bs]")
32.197659170022234
>>> # Using struct.unpack, converting the resulting tuple to list
>>> # to be fair to other methods
>>> timeit.timeit(setup="from __main__ import bs;from struct import unpack",
stmt="list(unpack('100c', bs))")
1.902243083808571
struct.unpackseems to be at least an order of magnitude faster than other methods, presumably because it operates at the byte level. int.to_bytes, on the other hand, performs worse than most of the "obvious" approaches.
struct.unpack似乎至少比其他方法快一个数量级,大概是因为它在字节级别运行。 int.to_bytes另一方面,它的性能比大多数“明显”方法差。
回答by kederrac
since python 3.5 you can use % formatting to bytes and bytearray:
从 python 3.5 开始,你可以使用% 格式化到 bytes 和 bytearray:
[b'%c' % i for i in b'123']
output:
输出:
[b'1', b'2', b'3']
the above solution is 2-3 times faster than your initial approach, if you want a more fast solution I will suggest to use numpy.frombuffer:
上述解决方案比您的初始方法快 2-3 倍,如果您想要更快的解决方案,我建议使用numpy.frombuffer:
import numpy as np
np.frombuffer(b'123', dtype='S1')
output:
输出:
array([b'1', b'2', b'3'],
dtype='|S1')
The second solution is ~10% faster than struct.unpack (I have used the same performance test as @snakecharmerb, against 100 random bytes)
第二种解决方案比 struct.unpack 快约 10%(我使用了与 @snakecharmerb 相同的性能测试,针对 100 个随机字节)
回答by MSeifert
I thought it might be useful to compare the runtimes of the different approaches so I made a benchmark (using my library simple_benchmark):
我认为比较不同方法的运行时间可能很有用,所以我做了一个基准测试(使用我的库simple_benchmark):
Probably unsurprisingly the NumPy solution is by far the fastest solution for large bytes object.
可能毫不奇怪,NumPy 解决方案是迄今为止大字节对象最快的解决方案。
But if a resulting list is desired then both the NumPy solution (with the tolist()) and the structsolution are much faster than the other alternatives.
但是,如果需要结果列表,那么 NumPy 解决方案(带有tolist())和struct解决方案都比其他替代方案快得多。
I didn't include guettlis answer because it's almost identical to jfs solution just instead of a comprehension a generator function is used.
我没有包含 guettlis 答案,因为它与 jfs 解决方案几乎相同,只是使用了生成器函数而不是理解。
import numpy as np
import struct
import sys
from simple_benchmark import BenchmarkBuilder
b = BenchmarkBuilder()
@b.add_function()
def jfs(bytes_obj):
return [bytes_obj[i:i+1] for i in range(len(bytes_obj))]
@b.add_function()
def snakecharmerb_tobytes(bytes_obj):
return [i.to_bytes(1, sys.byteorder) for i in bytes_obj]
@b.add_function()
def snakecharmerb_struct(bytes_obj):
return struct.unpack(str(len(bytes_obj)) + 'c', bytes_obj)
@b.add_function()
def Leon(bytes_obj):
return list(map(bytes, zip(bytes_obj)))
@b.add_function()
def rusu_ro1_format(bytes_obj):
return [b'%c' % i for i in bytes_obj]
@b.add_function()
def rusu_ro1_numpy(bytes_obj):
return np.frombuffer(bytes_obj, dtype='S1')
@b.add_function()
def rusu_ro1_numpy_tolist(bytes_obj):
return np.frombuffer(bytes_obj, dtype='S1').tolist()
@b.add_function()
def User38(bytes_obj):
return [chr(i).encode() for i in bytes_obj]
@b.add_arguments('byte object length')
def argument_provider():
for exp in range(2, 18):
size = 2**exp
yield size, b'a' * size
r = b.run()
r.plot()


