如何将字节列表(unicode)转换为 Python 字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23598299/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert list of bytes (unicode) to Python string?
提问by Bartosz Wójcik
I have a list of bytes (8 bit bytes, or in C/C++ language they form wchar_t type string), they form an UNICODE string (byte by byte), how to convert those values into a Python string, tried a few things, but none could join those 2 bytes into 1 character and build an entire string from it. Thank you.
我有一个字节列表(8 位字节,或者在 C/C++ 语言中它们形成 wchar_t 类型字符串),它们形成一个 UNICODE 字符串(逐字节),如何将这些值转换为 Python 字符串,尝试了一些事情,但是没有人可以将这 2 个字节连接成 1 个字符并从中构建一个完整的字符串。谢谢你。
采纳答案by Lev Levitsky
Converting a sequence of bytes to a Unicode string is done by calling the decode()
method on that str
(in Python 2.x) or bytes
(Python 3.x) object.
将字节序列转换为 Unicode 字符串是通过调用该(在 Python 2.x 中)或(Python 3.x)对象decode()
上的方法来完成的。str
bytes
If you actually have a list of bytes, then, to get this object, you can use ''.join(bytelist)
or b''.join(bytelist)
.
如果您确实有一个字节列表,那么要获取此对象,您可以使用 ''.join(bytelist)
或b''.join(bytelist)
。
You need to specify the encoding that was used to encode the original Unicode string.
您需要指定用于对原始 Unicode 字符串进行编码的编码。
However, the term "Python string" is a bit ambiguous and also version-dependent. The Python str
type stands for a byte string in Python 2.x and a Unicode string in Python 3.x. So, in Python 2, just doing ''.join(bytelist)
will give you a str
object.
但是,术语“Python 字符串”有点含糊,而且还依赖于版本。Pythonstr
类型在 Python 2.x 中代表字节字符串,在 Python 3.x 中代表 Unicode 字符串。所以,在 Python 2 中,只要做''.join(bytelist)
就会给你一个str
对象。
Demo for Python 2:
Python 2 演示:
In [1]: 'тест'
Out[1]: '\xd1\x82\xd0\xb5\xd1\x81\xd1\x82'
In [2]: bytelist = ['\xd1', '\x82', '\xd0', '\xb5', '\xd1', '\x81', '\xd1', '\x82']
In [3]: ''.join(bytelist).decode('utf-8')
Out[3]: u'\u0442\u0435\u0441\u0442'
In [4]: print ''.join(bytelist).decode('utf-8') # encodes to the terminal encoding
тест
In [5]: ''.join(bytelist) == 'тест'
Out[5]: True
回答by Umer
you can also convert the byte list into string list using the decode()
您还可以使用将字节列表转换为字符串列表 decode()
stringlist=[x.decode('utf-8') for x in bytelist]