Python 如何在熊猫数据框中将单元格设置为 NaN

Question

提问by Mark Morrisson

I'd like to replace bad values in a column of a dataframe by NaN's.

我想用 NaN 替换数据帧列中的错误值。

mydata = {'x' : [10, 50, 18, 32, 47, 20], 'y' : ['12', '11', 'N/A', '13', '15', 'N/A']}
df = pd.DataFrame(mydata)

df[df.y == 'N/A']['y'] = np.nan

Though, the last line fails and throws a warning because it's working on a copy of df. So, what's the correct way to handle this? I've seen many solutions with iloc or ix but here, I need to use a boolean condition.

但是，最后一行失败并抛出警告，因为它正在处理 df 的副本。那么，处理这个问题的正确方法是什么？我见过很多 iloc 或 ix 解决方案，但在这里，我需要使用布尔条件。

Answer 1

采纳答案by EdChum

just use replace:

只需使用replace：

In [106]:
df.replace('N/A',np.NaN)

Out[106]:
    x    y
0  10   12
1  50   11
2  18  NaN
3  32   13
4  47   15
5  20  NaN

What you're trying is called chain indexing: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

你正在尝试的是所谓的链索引：http: //pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

You can use locto ensure you operate on the original dF:

您可以使用loc以确保您在原始 dF 上操作：

In [108]:
df.loc[df['y'] == 'N/A','y'] = np.nan
df

Out[108]:
    x    y
0  10   12
1  50   11
2  18  NaN
3  32   13
4  47   15
5  20  NaN

Answer 2

回答by jmorrison

You can use replace:

您可以使用替换：

df['y'] = df['y'].replace({'N/A': np.nan})

Also be aware of the inplaceparameter for replace. You can do something like:

还要注意的inplace参数replace。您可以执行以下操作：

df.replace({'N/A': np.nan}, inplace=True)

This will replace all instances in the df without creating a copy.

这将替换 df 中的所有实例而不创建副本。

Similarly, if you run into other types of unknown values such as empty string or None value:

同样，如果您遇到其他类型的未知值，例如空字符串或 None 值：

df['y'] = df['y'].replace({'': np.nan})

df['y'] = df['y'].replace({None: np.nan})

Reference: Pandas Latest - Replace

参考：Pandas 最新 - 替换

Answer 3

回答by Severin Pappadeux

While using replaceseems to solve the problem, I would like to propose an alternative. Problem with mix of numeric and some string values in the column not to have strings replaced with np.nan, but to make whole column proper. I would bet that original column most likely is of an object type

虽然使用replace似乎可以解决问题，但我想提出一个替代方案。列中混合数字和一些字符串值的问题不是用 np.nan 替换字符串，而是使整个列正确。我敢打赌原始列很可能是对象类型

Name: y, dtype: object

What you really need is to make it a numeric column (it will have proper type and would be quite faster), with all non-numeric values replaced by NaN.

您真正需要的是使其成为数字列（它将具有正确的类型并且速度会更快），并将所有非数字值替换为 NaN。

Thus, good conversion code would be

因此，好的转换代码将是

pd.to_numeric(df['y'], errors='coerce')

Specify errors='coerce'to force strings that can't be parsed to a numeric value to become NaN. Column type would be

指定errors='coerce'强制无法解析为数值的字符串变为 NaN。列类型将是

Name: y, dtype: float64

Answer 4

回答by rolandpeng

You can try these snippets.

你可以试试这些片段。

In [16]:mydata = {'x' : [10, 50, 18, 32, 47, 20], 'y' : ['12', '11', 'N/A', '13', '15', 'N/A']}
In [17]:df=pd.DataFrame(mydata)

In [18]:df.y[df.y=="N/A"]=np.nan

Out[19]:df 
    x    y
0  10   12
1  50   11
2  18  NaN
3  32   13
4  47   15
5  20  NaN

Answer 5

回答by jeremie benichou

df.loc[df.y == 'N/A',['y']] = np.nan

This solve your problem. With the double [], you are working on a copy of the DataFrame. You have to specify exact location in one call to be able to modify it.

这解决了你的问题。使用双 []，您正在处理 DataFrame 的副本。您必须在一次调用中指定确切位置才能对其进行修改。

Python 如何在熊猫数据框中将单元格设置为 NaN

提问by Mark Morrisson

采纳答案by EdChum

回答by jmorrison

回答by Severin Pappadeux

回答by rolandpeng

回答by jeremie benichou

相关推荐

最近更新

标签

Python 如何在熊猫数据框中将单元格设置为 NaN

提问by Mark Morrisson

采纳答案by EdChum

回答by jmorrison

回答by Severin Pappadeux

回答by rolandpeng

回答by jeremie benichou

相关推荐

Python 如何判断哪个 Keras 模型更好？

Python Pandas：分组依据和数据透视表的区别

Python conda - 如何安装“R-essentials”中不可用的 R 包？

Python 更改seaborn热图的xticklabels字体大小

相关推荐

最近更新

标签