确定对象是否是 Python 中的字节类对象的正确方法是什么？

Question

提问by A. Wilcox

I have code that expects strbut will handle the case of being passed bytesin the following way:

我有期望的代码，str但会通过bytes以下方式处理被传递的情况：

if isinstance(data, bytes):
    data = data.decode()

Unfortunately, this does not work in the case of bytearray. Is there a more generic way to test whether an object is either bytesor bytearray, or should I just check for both? Is hasattr('decode')as bad as I feel it would be?

不幸的是，这不适用于bytearray. 有没有更通用的方法来测试一个对象是bytes还是bytearray，或者我应该只检查两者？是hasattr('decode')那么糟糕，因为我觉得它会是什么？

Answer 1

采纳答案by Elizafox

There are a few approaches you could use here.

您可以在此处使用几种方法。

Duck typing

鸭打字

Since Python is duck typed, you could simply do as follows (which seems to be the way usually suggested):

由于 Python 是duck typed，您可以简单地执行以下操作（这似乎是通常建议的方式）：

try:
    data = data.decode()
except (UnicodeDecodeError, AttributeError):
    pass

You could use hasattras you describe, however, and it'd probably be fine. This is, of course, assuming the .decode()method for the given object returns a string, and has no nasty side effects.

hasattr但是，您可以按照您的描述使用，并且可能没问题。当然，这是假设.decode()给定对象的方法返回一个字符串，并且没有令人讨厌的副作用。

I personally recommend either the exception or hasattrmethod, but whatever you use is up to you.

我个人推荐异常或hasattr方法，但无论您使用什么都取决于您。

Use str()

使用 str()

This approach is uncommon, but is possible:

这种方法并不常见，但也是可能的：

data = str(data, "utf-8")

Other encodings are permissible, just like with the buffer protocol's .decode(). You can also pass a third parameter to specify error handling.

其他编码也是允许的，就像缓冲协议的.decode(). 您还可以传递第三个参数来指定错误处理。

Single-dispatch generic functions (Python 3.4+)

单分派通用函数（Python 3.4+）

Python 3.4 and above include a nifty feature called single-dispatch generic functions, via functools.singledispatch. This is a bit more verbose, but it's also more explicit:

Python 3.4 及更高版本通过functools.singledispatch包含一个称为单分派通用函数的漂亮功能。这有点冗长，但也更明确：

def func(data):
    # This is the generic implementation
    data = data.decode()
    ...

@func.register(str)
def _(data):
    # data will already be a string
    ...

You could also make special handlers for bytearrayand bytesobjects if you so chose.

如果您愿意，您还可以为bytearray和bytes对象制作特殊处理程序。

Beware: single-dispatch functions only work on the first argument! This is an intentional feature, see PEP 433.

当心：单分派函数仅适用于第一个参数！这是一个有意的特性，参见PEP 433。

Answer 2

回答by zangw

You can use:

您可以使用：

isinstance(data, (bytes, bytearray))

Due to the different base class is used here.

由于这里使用的基类不同。

>>> bytes.__base__
<type 'basestring'>
>>> bytearray.__base__
<type 'object'>

To check bytes

去检查 bytes

>>> by = bytes()
>>> isinstance(by, basestring)
True

However,

然而，

>>> buf = bytearray()
>>> isinstance(buf, basestring)
False

The above codes are test under python 2.7

以上代码在python 2.7下测试

Unfortunately, under python 3.4, they are same....

不幸的是，在 python 3.4 下，它们是相同的......

>>> bytes.__base__
<class 'object'>
>>> bytearray.__base__
<class 'object'>

Answer 3

回答by Kevin

This code is not correct unless you know something we don't:

除非您知道我们不知道的内容，否则此代码是不正确的：

if isinstance(data, bytes):
    data = data.decode()

You do not (appear to) know the encoding of data. You are assuming it's UTF-8, but that could very well be wrong. Since you do not know the encoding, you do not have text. You have bytes, which could have any meaning under the sun.

您（似乎）不知道data. 您假设它是 UTF-8，但这很可能是错误的。由于您不知道编码，因此您没有 text。你有字节，在阳光下可能有任何意义。

The good news is that most random sequences of bytes are not valid UTF-8, so when this breaks, it will break loudly (errors='strict'is the default) instead of silently doing the wrong thing. The even better news is that most of those random sequences that happen to be valid UTF-8 are also valid ASCII, which (nearly) everyone agrees on how to parse anyway.

好消息是大多数随机字节序列都不是有效的 UTF-8，所以当它中断时，它会大声中断（errors='strict'默认）而不是默默地做错误的事情。更好的消息是，大多数碰巧是有效 UTF-8 的随机序列也是有效的 ASCII，（几乎）每个人都同意如何解析。

The bad news is that there is no reasonable way to fix this. There is a standard way of providing encoding information: use strinstead of bytes. If some third-party code handed you a bytesor bytearrayobject without any further context or information, the only correct action is to fail.

坏消息是没有合理的方法来解决这个问题。有一种提供编码信息的标准方法：使用str代替bytes. 如果某些第三方代码在没有任何进一步上下文或信息的情况下将bytes或bytearray对象交给您，则唯一正确的操作是失败。

Now, assuming you do know the encoding, you can use functools.singledispatchhere:

现在，假设你知道编码，你可以functools.singledispatch在这里使用：

@functools.singledispatch
def foo(data, other_arguments, ...):
    raise TypeError('Unknown type: '+repr(type(data)))

@foo.register(str)
def _(data, other_arguments, ...):
    # data is a str

@foo.register(bytes)
@foo.register(bytearray)
def _(data, other_arguments, ...):
    data = data.decode('encoding')
    # explicit is better than implicit; don't leave the encoding out for UTF-8
    return foo(data, other_arguments, ...)

This doesn't work on methods, and datahas to be the first argument. If those restrictions don't work for you, use one of the other answers instead.

这不适用于方法，并且data必须是第一个参数。如果这些限制对您不起作用，请改用其他答案之一。

Answer 4

回答by pepr

It depends what you want to solve. If you want to have the same code that converts both cases to a string, you can simply convert the type to bytesfirst, and then decode. This way, it is a one-liner:

这取决于你想解决什么。如果您想要将两种情况都转换为字符串的相同代码，您可以简单地bytes先将类型转换为，然后再进行解码。这样，它是一个单行：

#!python3

b1 = b'123456'
b2 = bytearray(b'123456')

print(type(b1))
print(type(b2))

s1 = bytes(b1).decode('utf-8')
s2 = bytes(b2).decode('utf-8')

print(s1)
print(s2)

This way, the answer for you may be:

这样，您的答案可能是：

data = bytes(data).decode()

Anyway, I suggest to write 'utf-8'explicitly to the decode, if you do not care to spare few bytes. The reason is that the next time you or someone else will read the source code, the situation will be more apparent.

无论如何，'utf-8'如果您不关心节省几个字节，我建议明确写入解码。原因是下次你或者别人看源码的时候，情况会更明显。

Answer 5

回答by Hyman O'Connor

There are two questions here, and the answers to them are different.

这里有两个问题，它们的答案是不同的。

The first question, the title of this post, is What is the proper way to determine if an object is a bytes-like object in Python?This includes a number of built-in types (bytes, bytearray, array.array, memoryview, others?) and possibly also user-defined types. The best way I know of to check for these is to try to create a memoryviewout of them:

第一个问题，即这篇文章的标题，是什么是在 Python 中确定对象是否为类字节对象的正确方法？这包括许多内置类型（bytes、bytearray、array.array、memoryview、其他？），也可能包括用户定义的类型。我所知道的检查这些的最好方法是尝试从中创建一个memoryview：

>>> memoryview(b"foo")
<memory at 0x7f7c43a70888>
>>> memoryview(u"foo")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: memoryview: a bytes-like object is required, not 'str'

In the body of the original post, though, it sounds like the question is instead How do I test whether an object supports decode()?@elizabeth-myers' above answer to this question is great. Note that not all bytes-like objects support decode().

但是，在原始帖子的正文中，问题听起来像是如何测试对象是否支持 decode()？@elizabeth-myers 上面对这个问题的回答很棒。请注意，并非所有类似字节的对象都支持 decode()。

Answer 6

回答by ZeroErr0r

>>> content = b"hello"
>>> text = "hello"
>>> type(content)
<class 'bytes'>
>>> type(text)
<class 'str'>
>>> type(text) is str
True
>>> type(content) is bytes
True

确定对象是否是 Python 中的字节类对象的正确方法是什么？

提问by A. Wilcox

采纳答案by Elizafox

Duck typing

鸭打字

Use str()

使用 str()

Single-dispatch generic functions (Python 3.4+)

单分派通用函数（Python 3.4+）

回答by zangw

回答by Kevin

回答by pepr

回答by Hyman O'Connor

回答by ZeroErr0r

相关推荐

最近更新

标签

确定对象是否是 Python 中的字节类对象的正确方法是什么？

提问by A. Wilcox

采纳答案by Elizafox

Duck typing

鸭打字

Use str()

使用 str()

Single-dispatch generic functions (Python 3.4+)

单分派通用函数（Python 3.4+）

回答by zangw

回答by Kevin

回答by pepr

回答by Hyman O'Connor

回答by ZeroErr0r

相关推荐

在 python seaborn lmplot 中更改标记大小

如何使用python命令运行不同版本python的pip？

Python 意外异常：调用 ansible2 时未定义名称“basestring”

Python 文本列上的 Pyspark DataFrame UDF

相关推荐

最近更新

标签