Python 如何在熊猫数据框中将单元格设置为 NaN
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34794067/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to set a cell to NaN in a pandas dataframe
提问by Mark Morrisson
I'd like to replace bad values in a column of a dataframe by NaN's.
我想用 NaN 替换数据帧列中的错误值。
mydata = {'x' : [10, 50, 18, 32, 47, 20], 'y' : ['12', '11', 'N/A', '13', '15', 'N/A']}
df = pd.DataFrame(mydata)
df[df.y == 'N/A']['y'] = np.nan
Though, the last line fails and throws a warning because it's working on a copy of df. So, what's the correct way to handle this? I've seen many solutions with iloc or ix but here, I need to use a boolean condition.
但是,最后一行失败并抛出警告,因为它正在处理 df 的副本。那么,处理这个问题的正确方法是什么?我见过很多 iloc 或 ix 解决方案,但在这里,我需要使用布尔条件。
采纳答案by EdChum
just use replace
:
只需使用replace
:
In [106]:
df.replace('N/A',np.NaN)
Out[106]:
x y
0 10 12
1 50 11
2 18 NaN
3 32 13
4 47 15
5 20 NaN
What you're trying is called chain indexing: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
你正在尝试的是所谓的链索引:http: //pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
You can use loc
to ensure you operate on the original dF:
您可以使用loc
以确保您在原始 dF 上操作:
In [108]:
df.loc[df['y'] == 'N/A','y'] = np.nan
df
Out[108]:
x y
0 10 12
1 50 11
2 18 NaN
3 32 13
4 47 15
5 20 NaN
回答by jmorrison
You can use replace:
您可以使用替换:
df['y'] = df['y'].replace({'N/A': np.nan})
Also be aware of the inplace
parameter for replace
. You can do something like:
还要注意 的inplace
参数replace
。您可以执行以下操作:
df.replace({'N/A': np.nan}, inplace=True)
This will replace all instances in the df without creating a copy.
这将替换 df 中的所有实例而不创建副本。
Similarly, if you run into other types of unknown values such as empty string or None value:
同样,如果您遇到其他类型的未知值,例如空字符串或 None 值:
df['y'] = df['y'].replace({'': np.nan})
df['y'] = df['y'].replace({None: np.nan})
Reference: Pandas Latest - Replace
回答by Severin Pappadeux
While using replace
seems to solve the problem, I would like to propose an alternative. Problem with mix of numeric and some string values in the column not to have strings replaced with np.nan, but to make whole column proper. I would bet that original column most likely is of an object type
虽然使用replace
似乎可以解决问题,但我想提出一个替代方案。列中混合数字和一些字符串值的问题不是用 np.nan 替换字符串,而是使整个列正确。我敢打赌原始列很可能是对象类型
Name: y, dtype: object
What you really need is to make it a numeric column (it will have proper type and would be quite faster), with all non-numeric values replaced by NaN.
您真正需要的是使其成为数字列(它将具有正确的类型并且速度会更快),并将所有非数字值替换为 NaN。
Thus, good conversion code would be
因此,好的转换代码将是
pd.to_numeric(df['y'], errors='coerce')
Specify errors='coerce'
to force strings that can't be parsed to a numeric value to become NaN. Column type would be
指定errors='coerce'
强制无法解析为数值的字符串变为 NaN。列类型将是
Name: y, dtype: float64
回答by rolandpeng
You can try these snippets.
你可以试试这些片段。
In [16]:mydata = {'x' : [10, 50, 18, 32, 47, 20], 'y' : ['12', '11', 'N/A', '13', '15', 'N/A']} In [17]:df=pd.DataFrame(mydata) In [18]:df.y[df.y=="N/A"]=np.nan Out[19]:df x y 0 10 12 1 50 11 2 18 NaN 3 32 13 4 47 15 5 20 NaN
回答by jeremie benichou
df.loc[df.y == 'N/A',['y']] = np.nan
This solve your problem. With the double [], you are working on a copy of the DataFrame. You have to specify exact location in one call to be able to modify it.
这解决了你的问题。使用双 [],您正在处理 DataFrame 的副本。您必须在一次调用中指定确切位置才能对其进行修改。