Python 将 latin1 转换为 UTF8

Question

提问by Eugene

In Python 2.7, how do you convert a latin1 string to UTF-8.

在 Python 2.7 中，如何将 latin1 字符串转换为 UTF-8。

For example, I'm trying to convert é to utf-8.

例如，我正在尝试将 é 转换为 utf-8。

>>> "é"
'\xe9'
>>> u"é"
u'\xe9'
>>> u"é".encode('utf-8')
'\xc3\xa9'
>>> print u"é".encode('utf-8')
??

The letter is é which is LATIN SMALL LETTER E WITH ACUTE (U+00E9) The UTF-8 byte encoding for is: c3a9
The latin byte encoding is: e9

字母为é，即拉丁文小写字母E WITH ACUTE (U+00E9) UTF-8 字节编码为：c3a9
拉丁文字节编码为：e9

How do I get the UTF-8 encoded version of a latin string? Could someone give an example of how to convert the é?

如何获取拉丁字符串的 UTF-8 编码版本？有人可以举例说明如何转换é吗？

Answer 1

采纳答案by Martijn Pieters

To decode a byte sequence from latin 1 to Unicode, use the .decode()method:

要将一个字节序列从 latin 1 解码为 Unicode，请使用以下.decode()方法：

>>> '\xe9'.decode('latin1')
u'\xe9'

Python uses \xabescapes for unicode codepoints below \u00ff.

Python\xab对下面的 unicode 代码点使用转义\u00ff。

>>> '\xe9'.decode('latin1') == u'\u00e9'
True

The above Latin-1 character can be encoded to UTF-8 as:

上面的 Latin-1 字符可以编码为 UTF-8，如下所示：

>>> '\xe9'.decode('latin1').encode('utf8')
'\xc3\xa9'

Answer 2

回答by John Kugelman

>>> u"é".encode('utf-8')
'\xc3\xa9'

You've got a UTF-8 encoded byte sequence. Don't try to print encoded bytes directly. To print them you need to decode the encoded bytes back into a Unicode string.

你有一个 UTF-8 编码的字节序列。不要尝试直接打印编码字节。要打印它们，您需要将编码的字节解码回 Unicode 字符串。

>>> u"é".encode('utf-8').decode('utf-8')
u'\xe9'
>>> print u"é".encode('utf-8').decode('utf-8')
é

Notice that encoding and decoding are opposite operations which effectively cancel out. You end up with the original u"é"string back, although Python prints it as the equivalent u'\xe9'.

请注意，编码和解码是相反的操作，它们有效地抵消了。你最终会与原来的u"é"字符串返回，虽然Python的打印它作为等价u'\xe9'。

>>> u"é" == u'\xe9'
True

Answer 3

回答by Shashank Agarwal

concept = concept.encode('ascii', 'ignore') concept = MySQLdb.escape_string(concept.decode('latin1').encode('utf8').rstrip())

概念=concept.encode('ascii', 'ignore')concept = MySQLdb.escape_string(concept.decode('latin1').encode('utf8').rstrip())

I do this, I am not sure if that is a good approach but it works everytime !!

我这样做，我不确定这是否是一个好方法，但它每次都有效！！

Python 将 latin1 转换为 UTF8

提问by Eugene

采纳答案by Martijn Pieters

回答by John Kugelman

回答by Shashank Agarwal

相关推荐

最近更新

标签

Python 将 latin1 转换为 UTF8

提问by Eugene

采纳答案by Martijn Pieters

回答by John Kugelman

回答by Shashank Agarwal

相关推荐

擦除整个数组 Python

Python中的文本移位功能

Python 仅包含年和月的日期对象

Python 检查变量是否为整数

相关推荐

最近更新

标签