Python UnicodeEncodeError: 'latin-1' 编解码器无法编码字符

Question

提问by ensnare

What could be causing this error when I try to insert a foreign character into the database?

当我尝试将外来字符插入数据库时，可能导致此错误的原因是什么？

>>UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256)

And how do I resolve it?

我该如何解决？

Thanks!

谢谢！

Answer 1

采纳答案by bobince

Character U+201C Left Double Quotation Mark is not present in the Latin-1 (ISO-8859-1) encoding.

字符 U+201C 左双引号不存在于 Latin-1 (ISO-8859-1) 编码中。

It ispresent in code page 1252 (Western European). This is a Windows-specific encoding that is based on ISO-8859-1 but which puts extra characters into the range 0x80-0x9F. Code page 1252 is often confused with ISO-8859-1, and it's an annoying but now-standard web browser behaviour that if you serve your pages as ISO-8859-1, the browser will treat them as cp1252 instead. However, they really are two distinct encodings:

这是目前在代码页1252（西欧）。这是一种基于 ISO-8859-1 的特定于 Windows 的编码，但会将额外的字符放入范围 0x80-0x9F。代码页 1252 经常与 ISO-8859-1 混淆，这是一种令人讨厌但现在是标准的 Web 浏览器行为，如果您将页面作为 ISO-8859-1 提供，浏览器会将它们视为 cp1252。但是，它们确实是两种不同的编码：

>>> u'He said \u201CHello\u201D'.encode('iso-8859-1')
UnicodeEncodeError
>>> u'He said \u201CHello\u201D'.encode('cp1252')
'He said \x93Hello\x94'

If you are using your database only as a byte store, you can use cp1252 to encode “and other characters present in the Windows Western code page. But still other Unicode characters which are not present in cp1252 will cause errors.

如果您仅将数据库用作字节存储，则可以使用 cp1252 对“Windows Western 代码页中存在的字符和其他字符进行编码。但是 cp1252 中不存在的其他 Unicode 字符也会导致错误。

You can use encode(..., 'ignore')to suppress the errors by getting rid of the characters, but really in this century you should be using UTF-8 in both your database and your pages. This encoding allows any character to be used. You should also ideally tell MySQL you are using UTF-8 strings (by setting the database connection and the collation on string columns), so it can get case-insensitive comparison and sorting right.

您可以encode(..., 'ignore')通过删除字符来抑制错误，但实际上在本世纪您应该在数据库和页面中使用 UTF-8。这种编码允许使用任何字符。理想情况下，您还应该告诉 MySQL 您正在使用 UTF-8 字符串（通过设置数据库连接和字符串列的排序规则），以便它可以正确进行不区分大小写的比较和排序。

Answer 2

回答by jabley

You are trying to store a Unicode codepoint \u201cusing an encoding ISO-8859-1 / Latin-1that can't describe that codepoint. Either you might need to alter the database to use utf-8, and store the string data using an appropriate encoding, or you might want to sanitise your inputs prior to storing the content; i.e. using something like Sam Ruby's excellent i18n guide. That talks about the issues that windows-1252can cause, and suggests how to process it, plus links to sample code!

您正在尝试\u201c使用ISO-8859-1 / Latin-1无法描述该代码点的编码来存储 Unicode 代码点。您可能需要更改数据库以使用 utf-8，并使用适当的编码存储字符串数据，或者您可能希望在存储内容之前清理您的输入；即使用类似 Sam Ruby 的优秀 i18n 指南。这讨论了windows-1252可能导致的问题，并建议如何处理它，以及示例代码的链接！

Answer 3

回答by msw

Latin-1 (aka ISO 8859-1) is a single octet character encoding scheme, and you can't fit \u201c(“) into a byte.

Latin-1（又名ISO 8859-1）是单个八位字节字符编码方案，您不能将\u201c( “) 放入一个字节中。

Did you mean to use UTF-8 encoding?

您的意思是使用 UTF-8 编码吗？

Answer 4

回答by knitti

I hope your database is at least UTF-8. Then you will need to run yourstring.encode('utf-8')before you try putting it into the database.

我希望你的数据库至少是 UTF-8。然后，您将需要yourstring.encode('utf-8')在尝试将其放入数据库之前运行。

Answer 5

回答by Nick

I ran into this same issue when using the Python MySQLdb module. Since MySQL will let you store just about any binary data you want in a text field regardless of character set, I found my solution here:

我在使用 Python MySQLdb 模块时遇到了同样的问题。由于 MySQL 可以让您在文本字段中存储您想要的任何二进制数据，而不管字符集如何，我在这里找到了我的解决方案：

Using UTF8 with Python MySQLdb

在 Python MySQLdb 中使用 UTF8

Edit: Quote from the above URL to satisfy the request in the first comment...

编辑：引用上述 URL 以满足第一条评论中的请求...

"UnicodeEncodeError:'latin-1' codec can't encode character ..."
This is because MySQLdb normally tries to encode everythin to latin-1. This can be fixed by executing the following commands right after you've etablished the connection:

“UnicodeEncodeError:'latin-1' 编解码器无法编码字符......”
这是因为 MySQLdb 通常会尝试将所有内容编码为 latin-1。这可以通过在建立连接后立即执行以下命令来解决：

db.set_character_set('utf8')
dbc.execute('SET NAMES utf8;')
dbc.execute('SET CHARACTER SET utf8;')
dbc.execute('SET character_set_connection=utf8;')

"db" is the result of MySQLdb.connect(), and "dbc" is the result of db.cursor().

“db”是的结果MySQLdb.connect()，“dbc”是的结果 db.cursor()。

Answer 6

回答by nids

Python: You will need to add # - * - coding: UTF-8 - * - (remove the spaces around * )to the first line of the python file. and then add the following to the text to encode: .encode('ascii', 'xmlcharrefreplace'). This will replace all the unicode characters with it's ASCII equivalent.

Python：您需要将 # - * - coding: UTF-8 - * - （删除 * 周围的空格）添加到 python 文件的第一行。然后将以下内容添加到要编码的文本中：.encode('ascii', 'xmlcharrefreplace')。这将用它的 ASCII 等价物替换所有 unicode 字符。

Answer 7

回答by Cheney

The best solution is

最好的解决办法是

set mysql's charset to 'utf-8'
do like this comment(add use_unicode=Trueand charset="utf8")
db = MySQLdb.connect(host="localhost", user = "root", passwd = "", db = "testdb", use_unicode=True, charset="utf8") – KyungHoon Kim Mar 13 '14 at 17:04

将 mysql 的字符集设置为 'utf-8'
喜欢这个评论（添加use_unicode=True和charset="utf8"）
db = MySQLdb.connect(host="localhost", user = "root", passwd = "", db = "testdb", use_unicode=True, charset="utf8") – KyungHoon Kim 2014 年 3 月 13 日 17:04

detail see :

详情见：

class Connection(_mysql.connection):

    """MySQL Database Connection Object"""

    default_cursor = cursors.Cursor

    def __init__(self, *args, **kwargs):
        """

        Create a connection to the database. It is strongly recommended
        that you only use keyword parameters. Consult the MySQL C API
        documentation for more information.

        host
          string, host to connect

        user
          string, user to connect as

        passwd
          string, password to use

        db
          string, database to use

        port
          integer, TCP/IP port to connect to

        unix_socket
          string, location of unix_socket to use

        conv
          conversion dictionary, see MySQLdb.converters

        connect_timeout
          number of seconds to wait before the connection attempt
          fails.

        compress
          if set, compression is enabled

        named_pipe
          if set, a named pipe is used to connect (Windows only)

        init_command
          command which is run once the connection is created

        read_default_file
          file from which default client values are read

        read_default_group
          configuration group to use from the default file

        cursorclass
          class object, used to create cursors (keyword only)

        use_unicode
          If True, text-like columns are returned as unicode objects
          using the connection's character set.  Otherwise, text-like
          columns are returned as strings.  columns are returned as
          normal strings. Unicode objects will always be encoded to
          the connection's character set regardless of this setting.

        charset
          If supplied, the connection character set will be changed
          to this character set (MySQL-4.1 and newer). This implies
          use_unicode=True.

        sql_mode
          If supplied, the session SQL mode will be changed to this
          setting (MySQL-4.1 and newer). For more details and legal
          values, see the MySQL documentation.

        client_flag
          integer, flags to use or 0
          (see MySQL docs or constants/CLIENTS.py)

        ssl
          dictionary or mapping, contains SSL connection parameters;
          see the MySQL documentation for more details
          (mysql_ssl_set()).  If this is set, and the client does not
          support SSL, NotSupportedError will be raised.

        local_infile
          integer, non-zero enables LOAD LOCAL INFILE; zero disables

        autocommit
          If False (default), autocommit is disabled.
          If True, autocommit is enabled.
          If None, autocommit isn't set and server default is used.

        There are a number of undocumented, non-standard methods. See the
        documentation for the MySQL C API for some hints on what they do.

        """

Answer 8

回答by mgojohn

SQLAlchemy users can simply specify their field as convert_unicode=True.

SQLAlchemy 用户可以简单地将他们的字段指定为convert_unicode=True.

Example: sqlalchemy.String(1000, convert_unicode=True)

例子： sqlalchemy.String(1000, convert_unicode=True)

SQLAlchemy will simply accept unicode objects and return them back, handling the encoding itself.

SQLAlchemy 将简单地接受 unicode 对象并将它们返回，处理编码本身。

Docs

文档

Answer 9

回答by Uday Allu

Use the below snippet to convert the text from Latin to English

使用以下代码段将文本从拉丁语转换为英语

import unicodedata
def strip_accents(text):
    return "".join(char for char in
                   unicodedata.normalize('NFKD', text)
                   if unicodedata.category(char) != 'Mn')

strip_accents('áéí?óúü')

output:

输出：

'aeinouu'

'爱努'

Python UnicodeEncodeError: 'latin-1' 编解码器无法编码字符

提问by ensnare

采纳答案by bobince

回答by jabley

回答by msw

回答by knitti

回答by Nick

回答by nids

回答by Cheney

回答by mgojohn

回答by Uday Allu

相关推荐

最近更新

标签

Python UnicodeEncodeError: 'latin-1' 编解码器无法编码字符

提问by ensnare

采纳答案by bobince

回答by jabley

回答by msw

回答by knitti

回答by Nick

回答by nids

回答by Cheney

回答by mgojohn

回答by Uday Allu

相关推荐

如何通过 adb（或通过 Python 命令）获取 android 内核版本？

Python 如何检查以下所有项目是否都在列表中？

Python 使用 sys.argv[1] 时出现“列表索引超出范围”

Python 如何使用可选参数构建装饰器？

相关推荐

最近更新

标签