Python Pandas 使用 NaN 值写入 sql

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23353732/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:48:52  来源:igfitidea点击:

Python Pandas write to sql with NaN values

pythonmysqlsqlpandas

提问by user3221876

I'm trying to read a few hundred tables from ascii and then write them to mySQL. It seems easy to do with Pandas but I hit an error that doesn't make sense to me:

我试图从 ascii 读取几百个表,然后将它们写入 mySQL。使用 Pandas 似乎很容易,但我遇到了一个对我来说没有意义的错误:

I have a data frame of 8 columns. Here is the column list/index:

我有一个 8 列的数据框。这是列列表/索引:

metricDF.columns

Index([u'FID', u'TYPE', u'CO', u'CITY', u'LINENO', u'SUBLINE', u'VALUE_010', u'VALUE2_015'], dtype=object)

I then use to_sqlto append the data up to mySQL

然后我使用to_sql将数据附加到 mySQL

metricDF.to_sql(con=con, name=seqFile, if_exists='append', flavor='mysql')

I get a strange error about a column being "nan":

我收到一个关于列是“nan”的奇怪错误:

OperationalError: (1054, "Unknown column 'nan' in 'field list'")

As you can see all my columns have names. I realize mysql/sql support for writing appears in development so perhaps that's the reason? If so is there a work around? Any suggestions would be greatly appreciated.

如您所见,我所有的列都有名称。我意识到开发中出现了 mysql/sql 对写作的支持,所以也许这就是原因?如果是这样,是否有解决方法?任何建议将不胜感激。

采纳答案by joris

Update: starting with pandas 0.15, to_sqlsupports writing NaNvalues (they will be written as NULLin the database), so the workaround described below should not be needed anymore (see https://github.com/pydata/pandas/pull/8208).
Pandas 0.15 will be released in coming October, and the feature is merged in the development version.

更新:从 pandas 0.15 开始,to_sql支持写入NaN值(它们将像NULL在数据库中一样写入),因此不再需要下面描述的解决方法(请参阅https://github.com/pydata/pandas/pull/8208)。
Pandas 0.15 将在即将到来的 10 月发布,该功能已合并到开发版本中。



This is probably due to NaNvalues in your table, and this is a known shortcoming at the moment that the pandas sql functions don't handle NaNs well (https://github.com/pydata/pandas/issues/2754, https://github.com/pydata/pandas/issues/4199)

这可能是由于NaN你的表值,这是目前该大熊猫SQL函数不处理的NaN以及一个已知的缺点(https://github.com/pydata/pandas/issues/2754HTTPS:/ /github.com/pydata/pandas/issues/4199)

As a workaround at this moment (for pandas versions 0.14.1 and lower), you can manually convert the nanvalues to None with:

作为此时的解决方法(对于 0.14.1 及更低版本的 Pandas),您可以使用以下命令手动将nan值转换为 None:

df2 = df.astype(object).where(pd.notnull(df), None)

and then write the dataframe to sql. This however converts all columns to object dtype. Because of this, you have to create the database table based on the original dataframe. Eg if your first row does not contain NaNs:

然后将数据帧写入sql。然而,这会将所有列转换为对象 dtype。因此,您必须基于原始数据框创建数据库表。例如,如果您的第一行不包含NaNs:

df[:1].to_sql('table_name', con)
df2[1:].to_sql('table_name', con, if_exists='append')

回答by Amine Kerkeni

using the previous solution will change column dtype from float64 to object_.

使用之前的解决方案会将列 dtype 从 float64 更改为 object_。

I have found a better solution, just add the following _write_mysql function:

我找到了一个更好的解决方案,只需添加以下 _write_mysql 函数:

from pandas.io import sql

def _write_mysql(frame, table, names, cur):
    bracketed_names = ['`' + column + '`' for column in names]
    col_names = ','.join(bracketed_names)
    wildcards = ','.join([r'%s'] * len(names))
    insert_query = "INSERT INTO %s (%s) VALUES (%s)" % (
        table, col_names, wildcards)

    data = [[None if type(y) == float and np.isnan(y) else y for y in x] for x in frame.values]

    cur.executemany(insert_query, data)

And then override its implementation in pandas as below:

然后在 Pandas 中重写它的实现,如下所示:

sql._write_mysql = _write_mysql

With this code, nan values will be saved correctly in the database without altering the column type.

使用此代码,nan 值将正确保存在数据库中,而不会更改列类型。