Python UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26641027/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)
提问by user3422637
I am using Python 2.7 and MySQLdb 1.2.3. I tried everything I found on stackoverflow and other forums to handle encoding errors my script is throwing.
My script reads data from all tables in a source MySQL DB, writes them in a python StringIO.StringIOobject, and then loads that data from StringIOobject to Postgres database (which apparently is in UTF-8 encoding format. I found this by looking into Properties--Definition of database in pgadmin) using psycopg2 library's copy_from command.
我使用的是 Python 2.7 和 MySQLdb 1.2.3。我尝试了我在 stackoverflow 和其他论坛上找到的所有内容来处理我的脚本抛出的编码错误。我的脚本从源 MySQL 数据库中的所有表中读取数据,将它们写入 pythonStringIO.StringIO对象中,然后将该数据从StringIO对象加载到 Postgres 数据库(显然是 UTF-8 编码格式。我通过查看属性发现了这一点——在 pgadmin 中定义数据库)使用 psycopg2 库的 copy_from 命令。
I found out that my source MySQL database has some tables in latin1_swedish_ci encoding while others in utf_8 encoding format (Found this from TABLE_COLLATION in information_schema.tables).
我发现我的源 MySQL 数据库有一些表采用 latin1_swedish_ci 编码,而另一些表采用 utf_8 编码格式(从 information_schema.tables 中的 TABLE_COLLATION 中找到)。
I wrote all this code on the top of my Python script based on my research on the internet.
根据我在互联网上的研究,我在 Python 脚本的顶部编写了所有这些代码。
db_conn = MySQLdb.connect(host=host,user=user,passwd=passwd,db=db, charset="utf8", init_command='SET NAMES UTF8' ,use_unicode=True)
db_conn.set_character_set('utf8')
db_conn_cursor = db_conn.cursor()
db_conn_cursor.execute('SET NAMES utf8;')
db_conn_cursor.execute('SET CHARACTER SET utf8;')
db_conn_cursor.execute('SET character_set_connection=utf8;')
I still get the UnicodeEncodeErrorbelow with this line: cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value,
我仍然UnicodeEncodeError通过这一行得到以下内容:cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value,
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)
I wrote the following line of code to clean cells in every table of source MySQL database when writing to StringIO object.
在写入 StringIO 对象时,我编写了以下代码行来清理源 MySQL 数据库的每个表中的单元格。
cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value
Please help.
请帮忙。
采纳答案by Joran Beasley
str(cell)is trying to convert cellto ASCII. ASCII only supports characters with ordinals less than 255. What is cell?
str(cell)正在尝试转换cell为 ASCII。ASCII 只支持序数小于 255 的字符。什么是单元格?
If cellis a unicode string, just do cell.encode("utf8"), and that will return a bytestring encoded as utf 8
如果cell是 unicode 字符串,只需执行cell.encode("utf8"),这将返回编码为 utf 8 的字节串
...or really iirc. If you pass mysql unicode, then the database will automagically convert it to utf8...
...或者真的是iirc。如果您通过 mysql unicode,那么数据库会自动将其转换为 utf8 ...
You could also try,
你也可以试试,
cell = unicode(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "")
or just use a 3rd party library. There is a good one that will fix text for you.
或者只使用 3rd 方库。有一个很好的可以为您修复文本。

