Python UnicodeEncodeError: 'latin-1' 编解码器无法编码字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3942888/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
UnicodeEncodeError: 'latin-1' codec can't encode character
提问by ensnare
What could be causing this error when I try to insert a foreign character into the database?
当我尝试将外来字符插入数据库时,可能导致此错误的原因是什么?
>>UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256)
And how do I resolve it?
我该如何解决?
Thanks!
谢谢!
采纳答案by bobince
Character U+201C Left Double Quotation Mark is not present in the Latin-1 (ISO-8859-1) encoding.
字符 U+201C 左双引号不存在于 Latin-1 (ISO-8859-1) 编码中。
It ispresent in code page 1252 (Western European). This is a Windows-specific encoding that is based on ISO-8859-1 but which puts extra characters into the range 0x80-0x9F. Code page 1252 is often confused with ISO-8859-1, and it's an annoying but now-standard web browser behaviour that if you serve your pages as ISO-8859-1, the browser will treat them as cp1252 instead. However, they really are two distinct encodings:
这是目前在代码页1252(西欧)。这是一种基于 ISO-8859-1 的特定于 Windows 的编码,但会将额外的字符放入范围 0x80-0x9F。代码页 1252 经常与 ISO-8859-1 混淆,这是一种令人讨厌但现在是标准的 Web 浏览器行为,如果您将页面作为 ISO-8859-1 提供,浏览器会将它们视为 cp1252。但是,它们确实是两种不同的编码:
>>> u'He said \u201CHello\u201D'.encode('iso-8859-1')
UnicodeEncodeError
>>> u'He said \u201CHello\u201D'.encode('cp1252')
'He said \x93Hello\x94'
If you are using your database only as a byte store, you can use cp1252 to encode “and other characters present in the Windows Western code page. But still other Unicode characters which are not present in cp1252 will cause errors.
如果您仅将数据库用作字节存储,则可以使用 cp1252 对“Windows Western 代码页中存在的字符和其他字符进行编码。但是 cp1252 中不存在的其他 Unicode 字符也会导致错误。
You can use encode(..., 'ignore')to suppress the errors by getting rid of the characters, but really in this century you should be using UTF-8 in both your database and your pages. This encoding allows any character to be used. You should also ideally tell MySQL you are using UTF-8 strings (by setting the database connection and the collation on string columns), so it can get case-insensitive comparison and sorting right.
您可以encode(..., 'ignore')通过删除字符来抑制错误,但实际上在本世纪您应该在数据库和页面中使用 UTF-8。这种编码允许使用任何字符。理想情况下,您还应该告诉 MySQL 您正在使用 UTF-8 字符串(通过设置数据库连接和字符串列的排序规则),以便它可以正确进行不区分大小写的比较和排序。
回答by jabley
You are trying to store a Unicode codepoint \u201cusing an encoding ISO-8859-1 / Latin-1that can't describe that codepoint. Either you might need to alter the database to use utf-8, and store the string data using an appropriate encoding, or you might want to sanitise your inputs prior to storing the content; i.e. using something like Sam Ruby's excellent i18n guide. That talks about the issues that windows-1252can cause, and suggests how to process it, plus links to sample code!
您正在尝试\u201c使用ISO-8859-1 / Latin-1无法描述该代码点的编码来存储 Unicode 代码点。您可能需要更改数据库以使用 utf-8,并使用适当的编码存储字符串数据,或者您可能希望在存储内容之前清理您的输入;即使用类似 Sam Ruby 的优秀 i18n 指南。这讨论了windows-1252可能导致的问题,并建议如何处理它,以及示例代码的链接!
回答by msw
Latin-1 (aka ISO 8859-1) is a single octet character encoding scheme, and you can't fit \u201c(“) into a byte.
Latin-1(又名ISO 8859-1)是单个八位字节字符编码方案,您不能将\u201c( “) 放入一个字节中。
Did you mean to use UTF-8 encoding?
您的意思是使用 UTF-8 编码吗?
回答by knitti
I hope your database is at least UTF-8. Then you will need to run yourstring.encode('utf-8')before you try putting it into the database.
我希望你的数据库至少是 UTF-8。然后,您将需要yourstring.encode('utf-8')在尝试将其放入数据库之前运行。
回答by Nick
I ran into this same issue when using the Python MySQLdb module. Since MySQL will let you store just about any binary data you want in a text field regardless of character set, I found my solution here:
我在使用 Python MySQLdb 模块时遇到了同样的问题。由于 MySQL 可以让您在文本字段中存储您想要的任何二进制数据,而不管字符集如何,我在这里找到了我的解决方案:
Using UTF8 with Python MySQLdb
Edit: Quote from the above URL to satisfy the request in the first comment...
编辑:引用上述 URL 以满足第一条评论中的请求...
"UnicodeEncodeError:'latin-1' codec can't encode character ..."
This is because MySQLdb normally tries to encode everythin to latin-1. This can be fixed by executing the following commands right after you've etablished the connection:
“UnicodeEncodeError:'latin-1' 编解码器无法编码字符......”
这是因为 MySQLdb 通常会尝试将所有内容编码为 latin-1。这可以通过在建立连接后立即执行以下命令来解决:
db.set_character_set('utf8')
dbc.execute('SET NAMES utf8;')
dbc.execute('SET CHARACTER SET utf8;')
dbc.execute('SET character_set_connection=utf8;')
"db" is the result of
MySQLdb.connect(), and "dbc" is the result ofdb.cursor().
“db”是 的结果
MySQLdb.connect(),“dbc”是 的结果db.cursor()。
回答by nids
Python: You will need to add # - * - coding: UTF-8 - * - (remove the spaces around * )to the first line of the python file. and then add the following to the text to encode: .encode('ascii', 'xmlcharrefreplace'). This will replace all the unicode characters with it's ASCII equivalent.
Python:您需要将 # - * - coding: UTF-8 - * - (删除 * 周围的空格)添加到 python 文件的第一行。然后将以下内容添加到要编码的文本中:.encode('ascii', 'xmlcharrefreplace')。这将用它的 ASCII 等价物替换所有 unicode 字符。
回答by Cheney
The best solution is
最好的解决办法是
- set mysql's charset to 'utf-8'
do like this comment(add
use_unicode=Trueandcharset="utf8")db = MySQLdb.connect(host="localhost", user = "root", passwd = "", db = "testdb", use_unicode=True, charset="utf8") – KyungHoon Kim Mar 13 '14 at 17:04
- 将 mysql 的字符集设置为 'utf-8'
喜欢这个评论(添加
use_unicode=True和charset="utf8")db = MySQLdb.connect(host="localhost", user = "root", passwd = "", db = "testdb", use_unicode=True, charset="utf8") – KyungHoon Kim 2014 年 3 月 13 日 17:04
detail see :
详情见:
class Connection(_mysql.connection):
"""MySQL Database Connection Object"""
default_cursor = cursors.Cursor
def __init__(self, *args, **kwargs):
"""
Create a connection to the database. It is strongly recommended
that you only use keyword parameters. Consult the MySQL C API
documentation for more information.
host
string, host to connect
user
string, user to connect as
passwd
string, password to use
db
string, database to use
port
integer, TCP/IP port to connect to
unix_socket
string, location of unix_socket to use
conv
conversion dictionary, see MySQLdb.converters
connect_timeout
number of seconds to wait before the connection attempt
fails.
compress
if set, compression is enabled
named_pipe
if set, a named pipe is used to connect (Windows only)
init_command
command which is run once the connection is created
read_default_file
file from which default client values are read
read_default_group
configuration group to use from the default file
cursorclass
class object, used to create cursors (keyword only)
use_unicode
If True, text-like columns are returned as unicode objects
using the connection's character set. Otherwise, text-like
columns are returned as strings. columns are returned as
normal strings. Unicode objects will always be encoded to
the connection's character set regardless of this setting.
charset
If supplied, the connection character set will be changed
to this character set (MySQL-4.1 and newer). This implies
use_unicode=True.
sql_mode
If supplied, the session SQL mode will be changed to this
setting (MySQL-4.1 and newer). For more details and legal
values, see the MySQL documentation.
client_flag
integer, flags to use or 0
(see MySQL docs or constants/CLIENTS.py)
ssl
dictionary or mapping, contains SSL connection parameters;
see the MySQL documentation for more details
(mysql_ssl_set()). If this is set, and the client does not
support SSL, NotSupportedError will be raised.
local_infile
integer, non-zero enables LOAD LOCAL INFILE; zero disables
autocommit
If False (default), autocommit is disabled.
If True, autocommit is enabled.
If None, autocommit isn't set and server default is used.
There are a number of undocumented, non-standard methods. See the
documentation for the MySQL C API for some hints on what they do.
"""
回答by mgojohn
SQLAlchemy users can simply specify their field as convert_unicode=True.
SQLAlchemy 用户可以简单地将他们的字段指定为convert_unicode=True.
Example:
sqlalchemy.String(1000, convert_unicode=True)
例子:
sqlalchemy.String(1000, convert_unicode=True)
SQLAlchemy will simply accept unicode objects and return them back, handling the encoding itself.
SQLAlchemy 将简单地接受 unicode 对象并将它们返回,处理编码本身。
回答by Uday Allu
Use the below snippet to convert the text from Latin to English
使用以下代码段将文本从拉丁语转换为英语
import unicodedata
def strip_accents(text):
return "".join(char for char in
unicodedata.normalize('NFKD', text)
if unicodedata.category(char) != 'Mn')
strip_accents('áéí?óúü')
output:
输出:
'aeinouu'
'爱努'

