Python 用 None 替换 Pandas 或 Numpy Nan 以与 MysqlDB 一起使用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14162723/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replacing Pandas or Numpy Nan with a None to use with MysqlDB
提问by Rishi
I am trying to write a Pandas dataframe (or can use a numpy array) to a mysql database using MysqlDB . MysqlDB doesn't seem understand 'nan' and my database throws out an error saying nan is not in the field list. I need to find a way to convert the 'nan' into a NoneType.
我正在尝试使用 MysqlDB 将 Pandas 数据帧(或可以使用 numpy 数组)写入 mysql 数据库。MysqlDB 似乎不理解“nan”,我的数据库抛出一个错误,指出 nan 不在字段列表中。我需要找到一种方法将“nan”转换为 NoneType。
Any ideas?
有任何想法吗?
采纳答案by Andy Hayden
@bogatron has it right, you can use where, it's worth noting that you can do this natively in pandas:
@bogatron 说得对,您可以使用where,值得注意的是,您可以在Pandas 中本地执行此操作:
df1 = df.where(pd.notnull(df), None)
Note: this changes the dtype of all columnsto object.
注意:这会将所有列的 dtype 更改为object.
Example:
例子:
In [1]: df = pd.DataFrame([1, np.nan])
In [2]: df
Out[2]:
0
0 1
1 NaN
In [3]: df1 = df.where(pd.notnull(df), None)
In [4]: df1
Out[4]:
0
0 1
1 None
Note: what you cannot do recast the DataFrames dtypeto allow all datatypes types, using astype, and then the DataFrame fillnamethod:
注意:您不能重铸 DataFramesdtype以允许所有数据类型类型,使用astype,然后使用 DataFramefillna方法:
df1 = df.astype(object).replace(np.nan, 'None')
Unfortunately neither this, nor using replace, works with Nonesee this (closed) issue.
不幸的是,无论是 this 还是 using replace,都不能使用Nonesee this (closed) issue。
As an aside, it's worth noting that for most use cases you don't need to replace NaN with None, see this question about the difference between NaN and None in pandas.
顺便说一句,值得注意的是,对于大多数用例,您不需要将 NaN 替换为 None,请参阅有关pandas 中 NaN 和 None 之间区别的问题。
However, in this specific case it seems you do (at least at the time of this answer).
但是,在这种特定情况下,您似乎这样做了(至少在此答案时)。
回答by bogatron
You can replace nanwith Nonein your numpy array:
您可以在 numpy 数组中替换nan为None:
>>> x = np.array([1, np.nan, 3])
>>> y = np.where(np.isnan(x), None, x)
>>> print y
[1.0 None 3.0]
>>> print type(y[1])
<type 'NoneType'>
回答by Robin Nemeth
Quite old, yet I stumbled upon the very same issue. Try doing this:
很老了,但我偶然发现了同样的问题。尝试这样做:
df['col_replaced'] = df['col_with_npnans'].apply(lambda x: None if np.isnan(x) else x)
回答by rodney cox
After stumbling around, this worked for me:
在绊倒之后,这对我有用:
df = df.astype(object).where(pd.notnull(df),None)
回答by EliadL
df = df.replace({np.nan: None})
Credit goes to this guy here on this Github issue.
在这个 Github 问题上归功于这个人。
回答by YaOzI
Just an addition to @Andy Hayden's answer:
只是对@Andy Hayden 的回答的补充:
Since DataFrame.maskis the opposite twin of DataFrame.where, they have the exactly same signature but with opposite meaning:
由于DataFrame.mask是 的对立孪生DataFrame.where,因此它们具有完全相同的签名但具有相反的含义:
DataFrame.whereis useful for Replacing values where the condition is False.DataFrame.maskis used for Replacing values where the condition is True.
DataFrame.where对于替换条件为False 的值很有用。DataFrame.mask用于替换条件为True 的值。
So in this question, using df.mask(df.isna(), other=None, inplace=True)might be more intuitive.
所以在这个问题中,使用df.mask(df.isna(), other=None, inplace=True)可能更直观。
回答by gaatjeniksaan
Another addition: be careful when replacing multiples and converting the type of the column back from objectto float. If you want to be certain that your None's won't flip back to np.NaN's apply @andy-hayden's suggestion with using pd.where.
Illustration of how replace can still go 'wrong':
另外除了:更换倍数和转换从柱背面的类型时要小心对象到浮动。如果您想确定您的None's 不会翻转回np.NaN's 应用@andy-hayden 的建议使用pd.where. 替换如何仍然“出错”的说明:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: df = pd.DataFrame({"a": [1, np.NAN, np.inf]})
In [4]: df
Out[4]:
a
0 1.0
1 NaN
2 inf
In [5]: df.replace({np.NAN: None})
Out[5]:
a
0 1
1 None
2 inf
In [6]: df.replace({np.NAN: None, np.inf: None})
Out[6]:
a
0 1.0
1 NaN
2 NaN
In [7]: df.where((pd.notnull(df)), None).replace({np.inf: None})
Out[7]:
a
0 1.0
1 NaN
2 NaN

