用 mode() 填充 NAN 数据不起作用 -Pandas
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38223579/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filling NAN data with mode() doesn't work -Pandas
提问by Gabdu Gunnu
I have a data set in which there is a series known as Outlet_Size
which contain either of {'Medium', nan, 'High', 'Small'}
around 2566 records are missing so I thought to fill it with mode() value so I wrote something like this :
我有一个数据集,其中有一个Outlet_Size
包含{'Medium', nan, 'High', 'Small'}
大约 2566 条记录的系列缺失,所以我想用 mode() 值填充它,所以我写了这样的东西:
train['Outlet_Size']=train['Outlet_Size'].fillna(train['Outlet_Size'].dropna().mode()]
But when I tried to find number of missing NaN record by command
但是当我试图通过命令查找丢失的 NaN 记录的数量时
sum(train['Outlet_Size'].isnull())
it is still showing 2566 NaN records.Why is it so ?
它仍然显示 2566 条 NaN 记录。为什么会这样?
Thank you for answers
谢谢你的回答
回答by EdChum
The problem here is that mode
returns a series and this is causing the fillna
to fail, if we look at a simple example:
这里的问题是mode
返回一个系列,这导致fillna
失败,如果我们看一个简单的例子:
In [194]:
df = pd.DataFrame({'a':['low','low',np.NaN,'medium','medium','medium','medium']})
df
Out[194]:
a
0 low
1 low
2 NaN
3 medium
4 medium
5 medium
6 medium
In [195]:
df['a'].fillna(df['a'].mode())
Out[195]:
0 low
1 low
2 NaN
3 medium
4 medium
5 medium
6 medium
Name: a, dtype: object
So you can see that it fails above, if we look at what mode
returns:
所以你可以看到上面的失败,如果我们看一下mode
返回的内容:
In [196]:
df['a'].mode()
Out[196]:
0 medium
dtype: object
it's a series albeit with a single row, so when you pass this to fillna
it only fills the first row, so what you want is to get the scalar value by indexing into the Series
:
它是一个系列,尽管只有一行,因此当您将fillna
它传递给它时,它只会填充第一行,因此您想要的是通过索引到以下内容来获取标量值Series
:
In [197]:
df['a'].fillna(df['a'].mode()[0])
Out[197]:
0 low
1 low
2 medium
3 medium
4 medium
5 medium
6 medium
Name: a, dtype: object
EDIT
编辑
Regarding whether dropna
is required, no it isn't:
关于是否dropna
需要,不,不是:
In [204]:
df = pd.DataFrame({'a':['low','low',np.NaN,'medium','medium','medium','medium',np.NaN,np.NaN,np.NaN,np.NaN]})
df['a'].mode()
Out[204]:
0 medium
dtype: object
You can see that NaN
is ignored
你可以看到它NaN
被忽略了