Python UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)

Question

提问by user3422637

I am using Python 2.7 and MySQLdb 1.2.3. I tried everything I found on stackoverflow and other forums to handle encoding errors my script is throwing. My script reads data from all tables in a source MySQL DB, writes them in a python StringIO.StringIOobject, and then loads that data from StringIOobject to Postgres database (which apparently is in UTF-8 encoding format. I found this by looking into Properties--Definition of database in pgadmin) using psycopg2 library's copy_from command.

我使用的是 Python 2.7 和 MySQLdb 1.2.3。我尝试了我在 stackoverflow 和其他论坛上找到的所有内容来处理我的脚本抛出的编码错误。我的脚本从源 MySQL 数据库中的所有表中读取数据，将它们写入 pythonStringIO.StringIO对象中，然后将该数据从StringIO对象加载到 Postgres 数据库（显然是 UTF-8 编码格式。我通过查看属性发现了这一点——在 pgadmin 中定义数据库）使用 psycopg2 库的 copy_from 命令。

I found out that my source MySQL database has some tables in latin1_swedish_ci encoding while others in utf_8 encoding format (Found this from TABLE_COLLATION in information_schema.tables).

我发现我的源 MySQL 数据库有一些表采用 latin1_swedish_ci 编码，而另一些表采用 utf_8 编码格式（从 information_schema.tables 中的 TABLE_COLLATION 中找到）。

I wrote all this code on the top of my Python script based on my research on the internet.

根据我在互联网上的研究，我在 Python 脚本的顶部编写了所有这些代码。

db_conn = MySQLdb.connect(host=host,user=user,passwd=passwd,db=db, charset="utf8", init_command='SET NAMES UTF8' ,use_unicode=True) 
db_conn.set_character_set('utf8') 
db_conn_cursor = db_conn.cursor()
db_conn_cursor.execute('SET NAMES utf8;')
db_conn_cursor.execute('SET CHARACTER SET utf8;')
db_conn_cursor.execute('SET character_set_connection=utf8;')

I still get the UnicodeEncodeErrorbelow with this line: cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value,

我仍然UnicodeEncodeError通过这一行得到以下内容：cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value，

UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)

I wrote the following line of code to clean cells in every table of source MySQL database when writing to StringIO object.

在写入 StringIO 对象时，我编写了以下代码行来清理源 MySQL 数据库的每个表中的单元格。

cell = str(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "") #Remove unwanted characters from column value

Please help.

请帮忙。

Answer 1

采纳答案by Joran Beasley

str(cell)is trying to convert cellto ASCII. ASCII only supports characters with ordinals less than 255. What is cell?

str(cell)正在尝试转换cell为 ASCII。ASCII 只支持序数小于 255 的字符。什么是单元格？

If cellis a unicode string, just do cell.encode("utf8"), and that will return a bytestring encoded as utf 8

如果cell是 unicode 字符串，只需执行cell.encode("utf8")，这将返回编码为 utf 8 的字节串

...or really iirc. If you pass mysql unicode, then the database will automagically convert it to utf8...

...或者真的是iirc。如果您通过 mysql unicode，那么数据库会自动将其转换为 utf8 ...

You could also try,

你也可以试试，

cell = unicode(cell).replace("\r", " ").replace("\n", " ").replace("\t", '').replace("\"", "")

or just use a 3rd party library. There is a good one that will fix text for you.

或者只使用 3rd 方库。有一个很好的可以为您修复文本。

Python UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)

提问by user3422637

采纳答案by Joran Beasley

相关推荐

最近更新

标签

Python UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 47: ordinal not in range(128)

提问by user3422637

采纳答案by Joran Beasley

相关推荐

Python h5py 无法打开用 h5py 创建的 HDF5 文件

Python 从熊猫数据框列中获取列表

Python ftplib - 指定端口

如果列值不为 NULL，则 Python 熊猫应用函数

相关推荐

最近更新

标签