如何在 SQLAlchemy 的 `create_engine` 中使用 `charset` 和 `encoding`(创建 Pandas 数据框)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45279863/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:04:37  来源:igfitidea点击:

how to use `charset` and `encoding` in `create_engine` of SQLAlchemy (to create pandas dataframe)?

mysqlpandassqlalchemyconnection-string

提问by toto_tico

I am very confused with the way charset and encodingwork in SQLAlchemy. I understand (and have read) the difference between charsets and encodings, and I have a good picture of the history of encodings.

我对SQLAlchemy 中字符集和编码的工作方式感到非常困惑。我了解(并已阅读)字符集和编码之间的区别,并且我对编码的历史有了很好的了解。

I have a table in MySQL in latin1_swedish_ci (Why?Possible because of this). I need to create a pandas dataframe in which I get the proper characters (and not weird symbols). Initially, this was in the code:

我在 MySQL 中的 latin1_swedish_ci 中有一个表为什么?可能因为这个。我需要创建一个 Pandas 数据框,在其中获取正确的字符(而不是奇怪的符号)。最初,这是在代码中:

connect_engine = create_engine('mysql://user:[email protected]/db')
sql_query = "select * from table1"
df = pandas.read_sql(sql_query, connect_engine)

We started having troubles with the ?character (corresponding to the u'\u0160'unicode, but instead we get '\x8a'). I expected this to work:

我们开始遇到?字符问题(对应于u'\u0160'unicode,但我们得到了 '\x8a')。我希望这能奏效:

connect_engine = create_engine('mysql://user:[email protected]/db', encoding='utf8') 

but, I continue getting '\x8a', which, I realized, makes sense given that the default of the encoding parameter is utf8. So, then, I tried encoding='latin1'to tackle the problem:

但是,我继续得到'\x8a',我意识到,鉴于编码参数的默认值是utf8. 所以,然后,我试图encoding='latin1'解决这个问题:

connect_engine = create_engine('mysql://user:[email protected]/db', encoding='latin1')

but, I still get the same '\x8a'. To be clear, in both cases (encoding='utf8'and encoding='latin1'), I can do mystring.decode('latin1')but not mystring.decode('utf8').

但是,我仍然得到相同的 '\x8a'。需要明确的是,在这两种情况下(encoding='utf8'encoding='latin1'),我可以做,mystring.decode('latin1')但不能 mystring.decode('utf8')

And then, I rediscovered the charsetparameter in the connection string, i.e. 'mysql://user:[email protected]/db?charset=latin1'. And after trying all possible combinations of charset and encoding, I found that this one work:

然后,我重新发现charset了连接字符串中的参数,即'mysql://user:[email protected]/db?charset=latin1'. 在尝试了所有可能的字符集和编码组合后,我发现这一项有效:

connect_engine = create_engine('mysql://user:[email protected]/db?charset=utf8')

I would appreciate if somebody can explain me how to correctly use the charsetin the connection string, and the encodingparameter in the create_engine?

如果有人能解释我如何正确使用连接字符串中的 以及? 中参数,我将不胜感激charsetencodingcreate_engine

采纳答案by univerio

encodingis the codec used for encoding/decoding within SQLAlchemy. From the documentation:

encoding是用于在 SQLAlchemy 中进行编码/解码的编解码器。从文档:

For those scenarios where the DBAPI is detected as not supporting a Python unicodeobject, this encoding is used to determine the source/destination encoding. It is not usedfor those cases where the DBAPI handles unicode directly.

[...]

To properly configure a system to accommodate Python unicodeobjects, the DBAPI should be configured to handle unicode to the greatest degree as is appropriate [...]

对于检测到 DBAPI 不支持 Pythonunicode对象的那些场景,此编码用于确定源/目标编码。它不用于DBAPI 直接处理 unicode 的情况。

[...]

要正确配置系统以容纳 Pythonunicode对象,应将 DBAPI 配置为在适当的情况下最大程度地处理 unicode [...]

mysql-python handles unicode directly, so there's no need to use this setting.

mysql-python 直接处理 unicode,因此无需使用此设置。

charsetis a setting specific to the mysql-python driver. From the documentation:

charset是特定于 mysql-python 驱动程序的设置。从文档

This charset is the client character setfor the connection.

此字符集是连接的客户端字符集

This setting controls three variableson the server, specifically character_set_results, which is what you are interested in. When set, strings are returned as unicodeobjects.

此设置控制服务器上的三个变量,具体来说character_set_results,这是您感兴趣的。设置后,字符串作为unicode对象返回。

Note that this applies only if you have latin1 encoded data in the database. If you've stored utf-8 bytes as latin1, you may have better luck using encodinginstead.

请注意,这仅适用于数据库中有 latin1 编码数据的情况。如果您已将 utf-8 字节存储为 latin1,则使用它可能会更好encoding

回答by IT man

encodingparameter does not work correctly.

encoding参数不能正常工作。

So, as @doru said in this link, you should add ?charset=utf8mb4at the end of the connection string. like this:

因此,正如@doru 在此链接中所说,您应该?charset=utf8mb4在连接字符串的末尾添加。像这样:

connect_string = 'mysql+pymysql://{}:{}@{}:{}/{}?charset=utf8mb4'.format(DB_USER, DB_PASS, DB_HOST, DB_PORT, DATABASE)

回答by Günay Gültekin

I had the same problem. I just added ?charset=utf8mb4at the end of the url.

我有同样的问题。我刚刚在 url 的末尾添加了?charset=utf8mb4

Here is mine:

这是我的:

Before

SQL_ENGINE = sqlalchemy.create_engine('mysql+pymysql://'+MySQL.USER+':'+MySQL.PASSWORD+'@'+MySQL.HOST+':'+str(MySQL.PORT)+'/'+MySQL.DB_NAME)

After

SQL_ENGINE = sqlalchemy.create_engine('mysql+pymysql://'+MySQL.USER+':'+MySQL.PASSWORD+'@'+MySQL.HOST+':'+str(MySQL.PORT)+'/'+MySQL.DB_NAME + "?charset=utf8mb4")

回答by W.Perrin

This works for me .

这对我有用。

from sqlalchemy import create_engine
from sqlalchemy.engine.url import URL

db_url = {
    'database': "dbname",
    'drivername': 'mysql',
    'username': 'myname',
    'password': 'mypassword',
    'host': '127.0.0.1',
    'query': {'charset': 'utf8'},  # the key-point setting
}

engine = create_engine(URL(**db_url), encoding="utf8")