确定对象是否是 Python 中的字节类对象的正确方法是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34869889/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is the proper way to determine if an object is a bytes-like object in Python?
提问by A. Wilcox
I have code that expects str
but will handle the case of being passed bytes
in the following way:
我有期望的代码,str
但会通过bytes
以下方式处理被传递的情况:
if isinstance(data, bytes):
data = data.decode()
Unfortunately, this does not work in the case of bytearray
. Is there a more generic way to test whether an object is either bytes
or bytearray
, or should I just check for both? Is hasattr('decode')
as bad as I feel it would be?
不幸的是,这不适用于bytearray
. 有没有更通用的方法来测试一个对象是bytes
还是bytearray
,或者我应该只检查两者?是hasattr('decode')
那么糟糕,因为我觉得它会是什么?
采纳答案by Elizafox
There are a few approaches you could use here.
您可以在此处使用几种方法。
Duck typing
鸭打字
Since Python is duck typed, you could simply do as follows (which seems to be the way usually suggested):
由于 Python 是duck typed,您可以简单地执行以下操作(这似乎是通常建议的方式):
try:
data = data.decode()
except (UnicodeDecodeError, AttributeError):
pass
You could use hasattr
as you describe, however, and it'd probably be fine. This is, of course, assuming the .decode()
method for the given object returns a string, and has no nasty side effects.
hasattr
但是,您可以按照您的描述使用,并且可能没问题。当然,这是假设.decode()
给定对象的方法返回一个字符串,并且没有令人讨厌的副作用。
I personally recommend either the exception or hasattr
method, but whatever you use is up to you.
我个人推荐异常或hasattr
方法,但无论您使用什么都取决于您。
Use str()
使用 str()
This approach is uncommon, but is possible:
这种方法并不常见,但也是可能的:
data = str(data, "utf-8")
Other encodings are permissible, just like with the buffer protocol's .decode()
. You can also pass a third parameter to specify error handling.
其他编码也是允许的,就像缓冲协议的.decode()
. 您还可以传递第三个参数来指定错误处理。
Single-dispatch generic functions (Python 3.4+)
单分派通用函数(Python 3.4+)
Python 3.4 and above include a nifty feature called single-dispatch generic functions, via functools.singledispatch. This is a bit more verbose, but it's also more explicit:
Python 3.4 及更高版本通过functools.singledispatch包含一个称为单分派通用函数的漂亮功能。这有点冗长,但也更明确:
def func(data):
# This is the generic implementation
data = data.decode()
...
@func.register(str)
def _(data):
# data will already be a string
...
You could also make special handlers for bytearray
and bytes
objects if you so chose.
如果您愿意,您还可以为bytearray
和bytes
对象制作特殊处理程序。
Beware: single-dispatch functions only work on the first argument! This is an intentional feature, see PEP 433.
当心:单分派函数仅适用于第一个参数!这是一个有意的特性,参见PEP 433。
回答by zangw
You can use:
您可以使用:
isinstance(data, (bytes, bytearray))
Due to the different base class is used here.
由于这里使用的基类不同。
>>> bytes.__base__
<type 'basestring'>
>>> bytearray.__base__
<type 'object'>
To check bytes
去检查 bytes
>>> by = bytes()
>>> isinstance(by, basestring)
True
However,
然而,
>>> buf = bytearray()
>>> isinstance(buf, basestring)
False
The above codes are test under python 2.7
以上代码在python 2.7下测试
Unfortunately, under python 3.4, they are same....
不幸的是,在 python 3.4 下,它们是相同的......
>>> bytes.__base__
<class 'object'>
>>> bytearray.__base__
<class 'object'>
回答by Kevin
This code is not correct unless you know something we don't:
除非您知道我们不知道的内容,否则此代码是不正确的:
if isinstance(data, bytes):
data = data.decode()
You do not (appear to) know the encoding of data
. You are assuming it's UTF-8, but that could very well be wrong. Since you do not know the encoding, you do not have text. You have bytes, which could have any meaning under the sun.
您(似乎)不知道data
. 您假设它是 UTF-8,但这很可能是错误的。由于您不知道编码,因此您没有 text。你有字节,在阳光下可能有任何意义。
The good news is that most random sequences of bytes are not valid UTF-8, so when this breaks, it will break loudly (errors='strict'
is the default) instead of silently doing the wrong thing. The even better news is that most of those random sequences that happen to be valid UTF-8 are also valid ASCII, which (nearly) everyone agrees on how to parse anyway.
好消息是大多数随机字节序列都不是有效的 UTF-8,所以当它中断时,它会大声中断(errors='strict'
默认)而不是默默地做错误的事情。更好的消息是,大多数碰巧是有效 UTF-8 的随机序列也是有效的 ASCII,(几乎)每个人都同意如何解析。
The bad news is that there is no reasonable way to fix this. There is a standard way of providing encoding information: use str
instead of bytes
. If some third-party code handed you a bytes
or bytearray
object without any further context or information, the only correct action is to fail.
坏消息是没有合理的方法来解决这个问题。有一种提供编码信息的标准方法:使用str
代替bytes
. 如果某些第三方代码在没有任何进一步上下文或信息的情况下将bytes
或bytearray
对象交给您,则唯一正确的操作是失败。
Now, assuming you do know the encoding, you can use functools.singledispatch
here:
现在,假设你知道编码,你可以functools.singledispatch
在这里使用:
@functools.singledispatch
def foo(data, other_arguments, ...):
raise TypeError('Unknown type: '+repr(type(data)))
@foo.register(str)
def _(data, other_arguments, ...):
# data is a str
@foo.register(bytes)
@foo.register(bytearray)
def _(data, other_arguments, ...):
data = data.decode('encoding')
# explicit is better than implicit; don't leave the encoding out for UTF-8
return foo(data, other_arguments, ...)
This doesn't work on methods, and data
has to be the first argument. If those restrictions don't work for you, use one of the other answers instead.
这不适用于方法,并且data
必须是第一个参数。如果这些限制对您不起作用,请改用其他答案之一。
回答by pepr
It depends what you want to solve. If you want to have the same code that converts both cases to a string, you can simply convert the type to bytes
first, and then decode. This way, it is a one-liner:
这取决于你想解决什么。如果您想要将两种情况都转换为字符串的相同代码,您可以简单地bytes
先将类型转换为,然后再进行解码。这样,它是一个单行:
#!python3
b1 = b'123456'
b2 = bytearray(b'123456')
print(type(b1))
print(type(b2))
s1 = bytes(b1).decode('utf-8')
s2 = bytes(b2).decode('utf-8')
print(s1)
print(s2)
This way, the answer for you may be:
这样,您的答案可能是:
data = bytes(data).decode()
Anyway, I suggest to write 'utf-8'
explicitly to the decode, if you do not care to spare few bytes. The reason is that the next time you or someone else will read the source code, the situation will be more apparent.
无论如何,'utf-8'
如果您不关心节省几个字节,我建议明确写入解码。原因是下次你或者别人看源码的时候,情况会更明显。
回答by Hyman O'Connor
There are two questions here, and the answers to them are different.
这里有两个问题,它们的答案是不同的。
The first question, the title of this post, is What is the proper way to determine if an object is a bytes-like object in Python?This includes a number of built-in types (bytes
, bytearray
, array.array
, memoryview
, others?) and possibly also user-defined types. The best way I know of to check for these is to try to create a memoryview
out of them:
第一个问题,即这篇文章的标题,是什么是在 Python 中确定对象是否为类字节对象的正确方法?这包括许多内置类型(bytes
、bytearray
、array.array
、memoryview
、 其他?),也可能包括用户定义的类型。我所知道的检查这些的最好方法是尝试从中创建一个memoryview
:
>>> memoryview(b"foo")
<memory at 0x7f7c43a70888>
>>> memoryview(u"foo")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: memoryview: a bytes-like object is required, not 'str'
In the body of the original post, though, it sounds like the question is instead How do I test whether an object supports decode()?@elizabeth-myers' above answer to this question is great. Note that not all bytes-like objects support decode().
但是,在原始帖子的正文中,问题听起来像是如何测试对象是否支持 decode()?@elizabeth-myers 上面对这个问题的回答很棒。请注意,并非所有类似字节的对象都支持 decode()。
回答by ZeroErr0r
>>> content = b"hello"
>>> text = "hello"
>>> type(content)
<class 'bytes'>
>>> type(text)
<class 'str'>
>>> type(text) is str
True
>>> type(content) is bytes
True