Python 将 latin1 转换为 UTF8
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14443760/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python converting latin1 to UTF8
提问by Eugene
In Python 2.7, how do you convert a latin1 string to UTF-8.
在 Python 2.7 中,如何将 latin1 字符串转换为 UTF-8。
For example, I'm trying to convert é to utf-8.
例如,我正在尝试将 é 转换为 utf-8。
>>> "é"
'\xe9'
>>> u"é"
u'\xe9'
>>> u"é".encode('utf-8')
'\xc3\xa9'
>>> print u"é".encode('utf-8')
??
The letter is é which is LATIN SMALL LETTER E WITH ACUTE (U+00E9)
The UTF-8 byte encoding for is: c3a9
The latin byte encoding is: e9
字母为é,即拉丁文小写字母E WITH ACUTE (U+00E9) UTF-8 字节编码为:c3a9
拉丁文字节编码为:e9
How do I get the UTF-8 encoded version of a latin string? Could someone give an example of how to convert the é?
如何获取拉丁字符串的 UTF-8 编码版本?有人可以举例说明如何转换é吗?
采纳答案by Martijn Pieters
To decode a byte sequence from latin 1 to Unicode, use the .decode()method:
要将一个字节序列从 latin 1 解码为 Unicode,请使用以下.decode()方法:
>>> '\xe9'.decode('latin1')
u'\xe9'
Python uses \xabescapes for unicode codepoints below \u00ff.
Python\xab对下面的 unicode 代码点使用转义\u00ff。
>>> '\xe9'.decode('latin1') == u'\u00e9'
True
The above Latin-1 character can be encoded to UTF-8 as:
上面的 Latin-1 字符可以编码为 UTF-8,如下所示:
>>> '\xe9'.decode('latin1').encode('utf8')
'\xc3\xa9'
回答by John Kugelman
>>> u"é".encode('utf-8')
'\xc3\xa9'
You've got a UTF-8 encoded byte sequence. Don't try to print encoded bytes directly. To print them you need to decode the encoded bytes back into a Unicode string.
你有一个 UTF-8 编码的字节序列。不要尝试直接打印编码字节。要打印它们,您需要将编码的字节解码回 Unicode 字符串。
>>> u"é".encode('utf-8').decode('utf-8')
u'\xe9'
>>> print u"é".encode('utf-8').decode('utf-8')
é
Notice that encoding and decoding are opposite operations which effectively cancel out. You end up with the original u"é"string back, although Python prints it as the equivalent u'\xe9'.
请注意,编码和解码是相反的操作,它们有效地抵消了。你最终会与原来的u"é"字符串返回,虽然Python的打印它作为等价u'\xe9'。
>>> u"é" == u'\xe9'
True
回答by Shashank Agarwal
concept = concept.encode('ascii', 'ignore') concept = MySQLdb.escape_string(concept.decode('latin1').encode('utf8').rstrip())
概念=concept.encode('ascii', 'ignore')concept = MySQLdb.escape_string(concept.decode('latin1').encode('utf8').rstrip())
I do this, I am not sure if that is a good approach but it works everytime !!
我这样做,我不确定这是否是一个好方法,但它每次都有效!!

