Python Pandas Fillna 中位数不起作用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49127897/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas Fillna Median not working
提问by danielo
I am trying to fill all the nans in a dataframe containing multiple columns and several rows. I am using this to train a multi variate ML-model so I want to fill the nans for each column with the median. Just to test the median function I did this:
我正在尝试填充包含多列和几行的数据框中的所有 nan。我正在使用它来训练多变量 ML 模型,因此我想用中位数填充每列的 nans。只是为了测试中值函数,我这样做了:
training_df.loc[[0]] = np.nan # Sets first row to nan
print(training_df.isnull().values.any()) # Prints true because we just inserted nans
test = training_df.fillna(training_df.median()) # Fillna with median
print(test.isnull().values.any()) # Check afterwards
But when I do this nothing happens, the print of the last row still returns True. If I try to change to use the median function like this instead:
但是当我这样做时什么也没有发生,最后一行的打印仍然返回 True。如果我尝试更改为使用像这样的中值函数:
training_df.fillna(training_df.median(), inplace=True)
Nothing happens as well. If I do this:
什么也没有发生。如果我这样做:
training_df = training_df.fillna(training_df.median(), inplace=True)
Training_df becomes none. How can I solve this?
Training_df 变为无。我该如何解决这个问题?
回答by jpp
As @thesilkworm suggested, convert your series to numeric first. Below is a minimal example:
正如@thesilkworm 建议的那样,首先将您的系列转换为数字。下面是一个最小的例子:
import pandas as pd, numpy as np
df = pd.DataFrame([[np.nan, np.nan, np.nan],
[5, 1, 2, 'hello'],
[1, 4, 3, 4],
[9, 8, 7, 6]], dtype=object)
df = df.fillna(df.median()) # fails
df[df.columns] = df[df.columns].apply(pd.to_numeric, errors='coerce')
df = df.fillna(df.median()) # works
回答by ZHERLOCK
df[df.columns] = df[df.columns].apply(pd.to_numeric, errors='coerce')
df = df.fillna(df.median())
Finally,this works! Thanks a lot!
And as my dataframe include one column that all data are str,
if I put the code as it was it will be all NaN for that column.
lucky to have it at third column, so I can skip it bydf.columns[:3] = df[df.columns[:3]].apply(pd.to_numeric, errors='coerce')
Anyway,it works!
最后,这有效!非常感谢!
由于我的数据框包含一个所有数据都是 str 的列,
如果我按原样放置代码,那么该列将全部为 NaN。
幸运的是在第三列有它,所以我可以跳过它df.columns[:3] = df[df.columns[:3]].apply(pd.to_numeric, errors='coerce')
无论如何,它有效!