用 mode() 填充 NAN 数据不起作用 -Pandas

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38223579/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:31:37  来源:igfitidea点击:

Filling NAN data with mode() doesn't work -Pandas

pythonpandasmachine-learningnan

提问by Gabdu Gunnu

I have a data set in which there is a series known as Outlet_Sizewhich contain either of {'Medium', nan, 'High', 'Small'}around 2566 records are missing so I thought to fill it with mode() value so I wrote something like this :

我有一个数据集,其中有一个Outlet_Size包含{'Medium', nan, 'High', 'Small'}大约 2566 条记录的系列缺失,所以我想用 mode() 值填充它,所以我写了这样的东西:

  train['Outlet_Size']=train['Outlet_Size'].fillna(train['Outlet_Size'].dropna().mode()]

But when I tried to find number of missing NaN record by command

但是当我试图通过命令查找丢失的 NaN 记录的数量时

  sum(train['Outlet_Size'].isnull()) 

it is still showing 2566 NaN records.Why is it so ?

它仍然显示 2566 条 NaN 记录。为什么会这样?

Thank you for answers

谢谢你的回答

回答by EdChum

The problem here is that modereturns a series and this is causing the fillnato fail, if we look at a simple example:

这里的问题是mode返回一个系列,这导致fillna失败,如果我们看一个简单的例子:

In [194]:    
df = pd.DataFrame({'a':['low','low',np.NaN,'medium','medium','medium','medium']})
df

Out[194]:
        a
0     low
1     low
2     NaN
3  medium
4  medium
5  medium
6  medium

In [195]:    
df['a'].fillna(df['a'].mode())

Out[195]:
0       low
1       low
2       NaN
3    medium
4    medium
5    medium
6    medium
Name: a, dtype: object

So you can see that it fails above, if we look at what modereturns:

所以你可以看到上面的失败,如果我们看一下mode返回的内容:

In [196]:    
df['a'].mode()

Out[196]:
0    medium
dtype: object

it's a series albeit with a single row, so when you pass this to fillnait only fills the first row, so what you want is to get the scalar value by indexing into the Series:

它是一个系列,尽管只有一行,因此当您将fillna它传递给它时,它只会填充第一行,因此您想要的是通过索引到以下内容来获取标量值Series

In [197]:    
df['a'].fillna(df['a'].mode()[0])

Out[197]:
0       low
1       low
2    medium
3    medium
4    medium
5    medium
6    medium
Name: a, dtype: object

EDIT

编辑

Regarding whether dropnais required, no it isn't:

关于是否dropna需要,不,不是:

In [204]:
df = pd.DataFrame({'a':['low','low',np.NaN,'medium','medium','medium','medium',np.NaN,np.NaN,np.NaN,np.NaN]})
df['a'].mode()

Out[204]:
0    medium
dtype: object

You can see that NaNis ignored

你可以看到它NaN被忽略了