字节文字的 Python 比较

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24842764/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 05:19:17  来源:igfitidea点击:

Python Comparison of byte literals

pythoncomparisonbyteendiannessbase

提问by Matthew Hemke

The following question arose because I was trying to use bytesstrings as dictionary keys and bytes values that I understood to be equal weren't being treated as equal.

出现以下问题是因为我试图将bytes字符串用作字典键,而我认为相等的字节值并未被视为相等。

Why doesn't the following python code compare equal - aren't these two equivalent representations of the same binary data (example knowingly chosen to avoid endianess)?

为什么下面的 python 代码比较不相等——这不是相同二进制数据的这两个等效表示(示例是故意选择以避免字节序)?

b'0b11111111' == b'0xff'

I know the following evaluates true, demonstrating the equivalence:

我知道以下评估为真,证明了等效性:

int(b'0b11111111', 2) == int(b'0xff', 16)

But why does python force me to know the representation? Is it related to endian-ness? Is there some easy way to force these to compare equivalent other than converting them all to e.g. hex literals? Can anyone suggest a transparent and clear method to move between all representations in a (somewhat) platform independent way (or am I asking too much)?

但是为什么python强迫我知道表示?它与字节序有关吗?除了将它们全部转换为十六进制文字之外,是否有一些简单的方法可以强制它们进行比较?任何人都可以建议一种透明和清晰的方法以(某种程度上)独立于平台的方式在所有表示之间移动(或者我要求太多)?

Edit:

编辑:

Given the comments below, say I want to actually index a dictionary using 8 bits in the form b'0b11111111', then why does python expand it to ten bytes and how do I prevent that?

鉴于下面的评论,假设我想在表单中实际使用 8 位索引字典b'0b11111111',那么为什么 python 将它扩展到 10 个字节,我该如何防止呢?

This is a smaller piece of a large tree data structure and expanding my indexing by a factor of 80 seems like a huge waste of memory.

这是大树数据结构的一小部分,将我的索引扩大 80 倍似乎是对内存的巨大浪费。

采纳答案by Martijn Pieters

Bytes can represent any number of things. Python cannot and will not guess at what your bytes might encode.

字节可以代表任意数量的事物。Python 不能也不会猜测你的字节可能编码什么。

For example, int(b'0b11111111', 34)is alsoa valid interpretation, but that interpretation is not equal to hex FF.

例如,int(b'0b11111111', 34)同样一个有效的解释,但这种解释并不等于十六进制FF。

The number of interpretations, in fact, is endless. The bytes could represent a series of ASCII codepoints, or image colors, or musical notes.

事实上,解释的数量是无穷无尽的。字节可以表示一系列 ASCII 代码点、图像颜色或音符。

Until you explicitly apply an interpretation, the bytes object consists justof the sequence of values in the range 0-255, and the textual representation of those bytes use ASCII if so representable as printable text:

在您明确应用解释之前,字节对象由0-255 范围内的值序列组成,如果可作为可打印文本表示,则这些字节的文本表示使用 ASCII:

>>> list(bytes(b'0b11111111'))
[48, 98, 49, 49, 49, 49, 49, 49, 49, 49]
>>> list(bytes(b'0xff'))
[48, 120, 102, 102]

Those byte sequences are not equal.

这些字节序列不相等。

If you want to interpret these sequences explicitly as integer literals, then use ast.literal_eval()to interpret decodedtext values; always normalise first before comparison:

如果您想将这些序列显式解释为整数文字,则使用ast.literal_eval()来解释解码的文本值;在比较之前总是先标准化:

>>> import ast
>>> ast.literal_eval(b'0b11111111'.decode('utf8'))
255
>>> ast.literal_eval(b'0xff'.decode('utf8'))
255

回答by unutbu

b'0b11111111'consists of 10 bytes:

b'0b11111111'由 10 个字节组成:

In [44]: list(b'0b11111111')
Out[44]: ['0', 'b', '1', '1', '1', '1', '1', '1', '1', '1']

whereas b'0xff'consists of 4 bytes:

b'0xff'由4个字节组成:

In [45]: list(b'0xff')
Out[45]: ['0', 'x', 'f', 'f']

Clearly, they are not the same objects.

显然,它们不是相同的对象。

Python values explicitness. (Explicit is better than implicit.) It does not assumethat b'0b11111111'is necessarily the binary representation of an integer. It's just a string of bytes. How you choose to interpret it must be explicitly stated.

Python 重视明确性。(显式比隐式更好。)它并不假设b'0b11111111'一定是整数的二进制表示。它只是一串字节。必须明确说明您选择如何解释它。

回答by ondra.cifka

It seems that what you were trying to do is get a byte string representing the value 0b11111111(or 255). This is not what b'0b11111111'does – that actually stands for a byte string representing the character(Unicode) string '0b11111111'.

似乎您想要做的是获取一个表示值0b11111111(或 255)的字节字符串。这不是什么b'0b11111111'- 它实际上代表一个字节字符串,表示字符(Unicode)字符串'0b11111111'

What you want would be written as b'\xff'. You can check that it is actually one byte: len(b'\xff') == 1.

你想要的会被写成b'\xff'. 您可以检查它实际上是一个字节:len(b'\xff') == 1.

To convert a Python intto a binary representation, you can use the ctypeslibrary. You need to choose one of the C integer types, e.g.:

要将 Python 转换为int二进制表示,您可以使用该ctypes库。您需要选择一种 C 整数类型,例如:

>>> bytes(ctypes.c_ubyte(255))
b'\xff'

>>> bytes(ctypes.c_ubyte(0xff))
b'\xff'

>>> bytes(ctypes.c_long(255))
b'\xff\x00\x00\x00\x00\x00\x00\x00'

Note: Instead of c_ubyteand c_long, you can use the aliases c_uint8(i.e. 8-bit unsigned C integer) and c_int64(64-bit signed C integer), respectively.

注意:您可以分别使用别名(即 8 位无符号 C 整数)和(64位有符号 C 整数)来代替c_ubyte和。c_longc_uint8c_int64

To convert back:

转换回来:

>>> ctypes.c_ubyte.from_buffer_copy(b'\xff').value
255

Be careful about overflow:

小心溢出:

>>> ctypes.c_ubyte(256)
c_ubyte(0)