Python Pandas 使用 NaN 值写入 sql
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23353732/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas write to sql with NaN values
提问by user3221876
I'm trying to read a few hundred tables from ascii and then write them to mySQL. It seems easy to do with Pandas but I hit an error that doesn't make sense to me:
我试图从 ascii 读取几百个表,然后将它们写入 mySQL。使用 Pandas 似乎很容易,但我遇到了一个对我来说没有意义的错误:
I have a data frame of 8 columns. Here is the column list/index:
我有一个 8 列的数据框。这是列列表/索引:
metricDF.columns
Index([u'FID', u'TYPE', u'CO', u'CITY', u'LINENO', u'SUBLINE', u'VALUE_010', u'VALUE2_015'], dtype=object)
I then use to_sql
to append the data up to mySQL
然后我使用to_sql
将数据附加到 mySQL
metricDF.to_sql(con=con, name=seqFile, if_exists='append', flavor='mysql')
I get a strange error about a column being "nan":
我收到一个关于列是“nan”的奇怪错误:
OperationalError: (1054, "Unknown column 'nan' in 'field list'")
As you can see all my columns have names. I realize mysql/sql support for writing appears in development so perhaps that's the reason? If so is there a work around? Any suggestions would be greatly appreciated.
如您所见,我所有的列都有名称。我意识到开发中出现了 mysql/sql 对写作的支持,所以也许这就是原因?如果是这样,是否有解决方法?任何建议将不胜感激。
采纳答案by joris
Update: starting with pandas 0.15, to_sql
supports writing NaN
values (they will be written as NULL
in the database), so the workaround described below should not be needed anymore (see https://github.com/pydata/pandas/pull/8208).
Pandas 0.15 will be released in coming October, and the feature is merged in the development version.
更新:从 pandas 0.15 开始,to_sql
支持写入NaN
值(它们将像NULL
在数据库中一样写入),因此不再需要下面描述的解决方法(请参阅https://github.com/pydata/pandas/pull/8208)。
Pandas 0.15 将在即将到来的 10 月发布,该功能已合并到开发版本中。
This is probably due to NaN
values in your table, and this is a known shortcoming at the moment that the pandas sql functions don't handle NaNs well (https://github.com/pydata/pandas/issues/2754, https://github.com/pydata/pandas/issues/4199)
这可能是由于NaN
你的表值,这是目前该大熊猫SQL函数不处理的NaN以及一个已知的缺点(https://github.com/pydata/pandas/issues/2754,HTTPS:/ /github.com/pydata/pandas/issues/4199)
As a workaround at this moment (for pandas versions 0.14.1 and lower), you can manually convert the nan
values to None with:
作为此时的解决方法(对于 0.14.1 及更低版本的 Pandas),您可以使用以下命令手动将nan
值转换为 None:
df2 = df.astype(object).where(pd.notnull(df), None)
and then write the dataframe to sql. This however converts all columns to object dtype. Because of this, you have to create the database table based on the original dataframe. Eg if your first row does not contain NaN
s:
然后将数据帧写入sql。然而,这会将所有列转换为对象 dtype。因此,您必须基于原始数据框创建数据库表。例如,如果您的第一行不包含NaN
s:
df[:1].to_sql('table_name', con)
df2[1:].to_sql('table_name', con, if_exists='append')
回答by Amine Kerkeni
using the previous solution will change column dtype from float64 to object_.
使用之前的解决方案会将列 dtype 从 float64 更改为 object_。
I have found a better solution, just add the following _write_mysql function:
我找到了一个更好的解决方案,只需添加以下 _write_mysql 函数:
from pandas.io import sql
def _write_mysql(frame, table, names, cur):
bracketed_names = ['`' + column + '`' for column in names]
col_names = ','.join(bracketed_names)
wildcards = ','.join([r'%s'] * len(names))
insert_query = "INSERT INTO %s (%s) VALUES (%s)" % (
table, col_names, wildcards)
data = [[None if type(y) == float and np.isnan(y) else y for y in x] for x in frame.values]
cur.executemany(insert_query, data)
And then override its implementation in pandas as below:
然后在 Pandas 中重写它的实现,如下所示:
sql._write_mysql = _write_mysql
With this code, nan values will be saved correctly in the database without altering the column type.
使用此代码,nan 值将正确保存在数据库中,而不会更改列类型。