Python Pandas - 在分类数据中填充 NaN

Question

提问by deega

I am trying to fill missing values (NAN) using the below code

我正在尝试使用以下代码填充缺失值（NAN）

NAN_SUBSTITUTION_VALUE = 1
g = g.fillna(NAN_SUBSTITUTION_VALUE)

but I am getting the following error

但我收到以下错误

ValueError: fill value must be in categories.

Would anybody please throw some light on this error.

有没有人请对这个错误有所了解。

Answer 1

回答by pacholik

Once you create Categorical Data, you can insert only values in category.

创建Categorical Data 后，您只能在类别中插入值。

>>> df
    ID  value
0    0     20
1    1     43
2    2     45

>>> df["cat"] = df["value"].astype("category")
>>> df
    ID  value    cat
0    0     20     20
1    1     43     43
2    2     45     45

>>> df.loc[1, "cat"] = np.nan
>>> df
    ID  value    cat
0    0     20     20
1    1     43    NaN
2    2     45     45

>>> df.fillna(1)
ValueError: fill value must be in categories
>>> df.fillna(43)
    ID  value    cat
0    0     20     20
1    1     43     43
2    2     45     45

Answer 2

回答by Gunnar Cheng

Add the category before you fill:

在填写之前添加类别：

g = g.cat.add_categories([1])
g.fillna(1)

Answer 3

回答by bluenote10

Your question is missing the important point what gis, especially that it has dtype categorical. I assume it is something like this:

您的问题缺少重点是什么g，尤其是它具有 dtype categorical。我假设它是这样的：

g = pd.Series(["A", "B", "C", np.nan], dtype="category")

The problem you are experiencing is that fillnarequires a value that already exists as a category. For instance, g.fillna("A")would work, but g.fillna("D")fails. To fill the series with a new value you can do:

您遇到的问题是fillna需要一个已作为类别存在的值。例如，g.fillna("A")会工作，但g.fillna("D")失败。要使用新值填充系列，您可以执行以下操作：

g_without_nan = g.cat.add_categories("D").fillna("D")

Answer 4

回答by Victor Zuanazzi

Sometimes you may want to replace the NaN with values present in your dataset, you can use that then:

有时您可能想用数据集中存在的值替换 NaN，然后可以使用它：

#creates a random permuation of the categorical values
permutation = np.random.permutation(df[field])

#erase the empty values
empty_is = np.where(permutation == "")
permutation = np.delete(permutation, empty_is)

#replace all empty values of the dataframe[field]
end = len(permutation)
df[field] = df[field].apply(lambda x: permutation[np.random.randint(end)] if pd.isnull(x) else x)

It works quite efficiently.

它的工作效率很高。

Python Pandas - 在分类数据中填充 NaN

提问by deega

回答by pacholik

回答by Gunnar Cheng

回答by bluenote10

回答by Victor Zuanazzi

相关推荐

最近更新

标签

Python Pandas - 在分类数据中填充 NaN

提问by deega

回答by pacholik

回答by Gunnar Cheng

回答by bluenote10

回答by Victor Zuanazzi

相关推荐

Python SystemError：新样式 getargs 格式但参数不是元组？

Python 没有“+”运算符的字符串连接

如何在python中使用xlsxwriter将数据写入/更新到现有XLSX工作簿的单元格中

Python 让 PyC​​harm 导入 sklearn

相关推荐

最近更新

标签

Python 让 PyCharm 导入 sklearn