Python 将 latin1 转换为 UTF8

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14443760/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:31:15  来源:igfitidea点击:

Python converting latin1 to UTF8

pythonencodingutf-8python-2.7latin1

提问by Eugene

In Python 2.7, how do you convert a latin1 string to UTF-8.

在 Python 2.7 中,如何将 latin1 字符串转换为 UTF-8。

For example, I'm trying to convert é to utf-8.

例如,我正在尝试将 é 转换为 utf-8。

>>> "é"
'\xe9'
>>> u"é"
u'\xe9'
>>> u"é".encode('utf-8')
'\xc3\xa9'
>>> print u"é".encode('utf-8')
??

The letter is é which is LATIN SMALL LETTER E WITH ACUTE (U+00E9) The UTF-8 byte encoding for is: c3a9
The latin byte encoding is: e9

字母为é,即拉丁文小写字母E WITH ACUTE (U+00E9) UTF-8 字节编码为:c3a9
拉丁文字节编码为:e9

How do I get the UTF-8 encoded version of a latin string? Could someone give an example of how to convert the é?

如何获取拉丁字符串的 UTF-8 编码版本?有人可以举例说明如何转换é吗?

采纳答案by Martijn Pieters

To decode a byte sequence from latin 1 to Unicode, use the .decode()method:

要将一个字节序列从 latin 1 解码为 Unicode,请使用以下.decode()方法

>>> '\xe9'.decode('latin1')
u'\xe9'

Python uses \xabescapes for unicode codepoints below \u00ff.

Python\xab对下面的 unicode 代码点使用转义\u00ff

>>> '\xe9'.decode('latin1') == u'\u00e9'
True

The above Latin-1 character can be encoded to UTF-8 as:

上面的 Latin-1 字符可以编码为 UTF-8,如下所示:

>>> '\xe9'.decode('latin1').encode('utf8')
'\xc3\xa9'

回答by John Kugelman

>>> u"é".encode('utf-8')
'\xc3\xa9'

You've got a UTF-8 encoded byte sequence. Don't try to print encoded bytes directly. To print them you need to decode the encoded bytes back into a Unicode string.

你有一个 UTF-8 编码的字节序列。不要尝试直接打印编码字节。要打印它们,您需要将编码的字节解码回 Unicode 字符串。

>>> u"é".encode('utf-8').decode('utf-8')
u'\xe9'
>>> print u"é".encode('utf-8').decode('utf-8')
é

Notice that encoding and decoding are opposite operations which effectively cancel out. You end up with the original u"é"string back, although Python prints it as the equivalent u'\xe9'.

请注意,编码和解码是相反的操作,它们有效地抵消了。你最终会与原来的u"é"字符串返回,虽然Python的打印它作为等价u'\xe9'

>>> u"é" == u'\xe9'
True

回答by Shashank Agarwal

concept = concept.encode('ascii', 'ignore') concept = MySQLdb.escape_string(concept.decode('latin1').encode('utf8').rstrip())

概念=concept.encode('ascii', 'ignore')concept = MySQLdb.escape_string(concept.decode('latin1').encode('utf8').rstrip())

I do this, I am not sure if that is a good approach but it works everytime !!

我这样做,我不确定这是否是一个好方法,但它每次都有效!!