如何将字节列表(unicode)转换为 Python 字符串?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23598299/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 03:13:40  来源:igfitidea点击:

How to convert list of bytes (unicode) to Python string?

pythonstringunicode

提问by Bartosz Wójcik

I have a list of bytes (8 bit bytes, or in C/C++ language they form wchar_t type string), they form an UNICODE string (byte by byte), how to convert those values into a Python string, tried a few things, but none could join those 2 bytes into 1 character and build an entire string from it. Thank you.

我有一个字节列表(8 位字节,或者在 C/C++ 语言中它们形成 wchar_t 类型字符串),它们形成一个 UNICODE 字符串(逐字节),如何将这些值转换为 Python 字符串,尝试了一些事情,但是没有人可以将这 2 个字节连接成 1 个字符并从中构建一个完整的字符串。谢谢你。

采纳答案by Lev Levitsky

Converting a sequence of bytes to a Unicode string is done by calling the decode()method on that str(in Python 2.x) or bytes(Python 3.x) object.

将字节序列转换为 Unicode 字符串是通过调用该(在 Python 2.x 中)或(Python 3.x)对象decode()上的方法来完成的。strbytes

If you actually have a list of bytes, then, to get this object, you can use ''.join(bytelist)or b''.join(bytelist).

如果您确实有一个字节列表,那么要获取此对象,您可以使用 ''.join(bytelist)b''.join(bytelist)

You need to specify the encoding that was used to encode the original Unicode string.

您需要指定用于对原始 Unicode 字符串进行编码的编码。

However, the term "Python string" is a bit ambiguous and also version-dependent. The Python strtype stands for a byte string in Python 2.x and a Unicode string in Python 3.x. So, in Python 2, just doing ''.join(bytelist)will give you a strobject.

但是,术语“Python 字符串”有点含糊,而且还依赖于版本。Pythonstr类型在 Python 2.x 中代表字节字符串,在 Python 3.x 中代表 Unicode 字符串。所以,在 Python 2 中,只要做''.join(bytelist)就会给你一个str对象。

Demo for Python 2:

Python 2 演示:

In [1]: 'тест'
Out[1]: '\xd1\x82\xd0\xb5\xd1\x81\xd1\x82'

In [2]: bytelist = ['\xd1', '\x82', '\xd0', '\xb5', '\xd1', '\x81', '\xd1', '\x82']

In [3]: ''.join(bytelist).decode('utf-8')
Out[3]: u'\u0442\u0435\u0441\u0442'

In [4]: print ''.join(bytelist).decode('utf-8') # encodes to the terminal encoding
тест

In [5]: ''.join(bytelist) == 'тест'
Out[5]: True

回答by Umer

you can also convert the byte list into string list using the decode()

您还可以使用将字节列表转换为字符串列表 decode()

stringlist=[x.decode('utf-8') for x in bytelist]