Python Pandas - 在分类数据中填充 NaN

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32718639/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:07:15  来源:igfitidea点击:

Pandas - filling NaNs in Categorical data

pythonpandas

提问by deega

I am trying to fill missing values (NAN) using the below code

我正在尝试使用以下代码填充缺失值(NAN)

NAN_SUBSTITUTION_VALUE = 1
g = g.fillna(NAN_SUBSTITUTION_VALUE)

but I am getting the following error

但我收到以下错误

ValueError: fill value must be in categories.

Would anybody please throw some light on this error.

有没有人请对这个错误有所了解。

回答by pacholik

Once you create Categorical Data, you can insert only values in category.

创建Categorical Data 后,您只能在类别中插入值。

>>> df
    ID  value
0    0     20
1    1     43
2    2     45

>>> df["cat"] = df["value"].astype("category")
>>> df
    ID  value    cat
0    0     20     20
1    1     43     43
2    2     45     45

>>> df.loc[1, "cat"] = np.nan
>>> df
    ID  value    cat
0    0     20     20
1    1     43    NaN
2    2     45     45

>>> df.fillna(1)
ValueError: fill value must be in categories
>>> df.fillna(43)
    ID  value    cat
0    0     20     20
1    1     43     43
2    2     45     45

回答by Gunnar Cheng

Add the category before you fill:

在填写之前添加类别:

g = g.cat.add_categories([1])
g.fillna(1)

回答by bluenote10

Your question is missing the important point what gis, especially that it has dtype categorical. I assume it is something like this:

您的问题缺少重点是什么g,尤其是它具有 dtype categorical。我假设它是这样的:

g = pd.Series(["A", "B", "C", np.nan], dtype="category")

The problem you are experiencing is that fillnarequires a value that already exists as a category. For instance, g.fillna("A")would work, but g.fillna("D")fails. To fill the series with a new value you can do:

您遇到的问题是fillna需要一个已作为类别存在的值。例如,g.fillna("A")会工作,但g.fillna("D")失败。要使用新值填充系列,您可以执行以下操作:

g_without_nan = g.cat.add_categories("D").fillna("D")

回答by Victor Zuanazzi

Sometimes you may want to replace the NaN with values present in your dataset, you can use that then:

有时您可能想用数据集中存在的值替换 NaN,然后​​可以使用它:

#creates a random permuation of the categorical values
permutation = np.random.permutation(df[field])

#erase the empty values
empty_is = np.where(permutation == "")
permutation = np.delete(permutation, empty_is)

#replace all empty values of the dataframe[field]
end = len(permutation)
df[field] = df[field].apply(lambda x: permutation[np.random.randint(end)] if pd.isnull(x) else x)

It works quite efficiently.

它的工作效率很高。