Python Pandas Fillna 中位数不起作用

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49127897/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:16:47  来源:igfitidea点击:

Python Pandas Fillna Median not working

pythonpython-3.xpython-2.7pandasdataframe

提问by danielo

I am trying to fill all the nans in a dataframe containing multiple columns and several rows. I am using this to train a multi variate ML-model so I want to fill the nans for each column with the median. Just to test the median function I did this:

我正在尝试填充包含多列和几行的数据框中的所有 nan。我正在使用它来训练多变量 ML 模型,因此我想用中位数填充每列的 nans。只是为了测试中值函数,我这样做了:

training_df.loc[[0]] = np.nan # Sets first row to nan
print(training_df.isnull().values.any()) # Prints true because we just inserted nans
test = training_df.fillna(training_df.median()) # Fillna with median
print(test.isnull().values.any()) # Check afterwards

But when I do this nothing happens, the print of the last row still returns True. If I try to change to use the median function like this instead:

但是当我这样做时什么也没有发生,最后一行的打印仍然返回 True。如果我尝试更改为使用像这样的中值函数:

training_df.fillna(training_df.median(), inplace=True)

Nothing happens as well. If I do this:

什么也没有发生。如果我这样做:

training_df = training_df.fillna(training_df.median(), inplace=True)

Training_df becomes none. How can I solve this?

Training_df 变为无。我该如何解决这个问题?

回答by jpp

As @thesilkworm suggested, convert your series to numeric first. Below is a minimal example:

正如@thesilkworm 建议的那样,首先将您的系列转换为数字。下面是一个最小的例子:

import pandas as pd, numpy as np

df = pd.DataFrame([[np.nan, np.nan, np.nan],
                   [5, 1, 2, 'hello'],
                   [1, 4, 3, 4],
                   [9, 8, 7, 6]], dtype=object)

df = df.fillna(df.median())  # fails

df[df.columns] = df[df.columns].apply(pd.to_numeric, errors='coerce')

df = df.fillna(df.median())  # works

回答by ZHERLOCK

df[df.columns] = df[df.columns].apply(pd.to_numeric, errors='coerce')

df = df.fillna(df.median())

Finally,this works! Thanks a lot!
And as my dataframe include one column that all data are str,
if I put the code as it was it will be all NaN for that column.
lucky to have it at third column, so I can skip it by
df.columns[:3] = df[df.columns[:3]].apply(pd.to_numeric, errors='coerce')
Anyway,it works!

最后,这有效!非常感谢!
由于我的数据框包含一个所有数据都是 str 的列,
如果我按原样放置代码,那么该列将全部为 NaN。
幸运的是在第三列有它,所以我可以跳过它
df.columns[:3] = df[df.columns[:3]].apply(pd.to_numeric, errors='coerce')
无论如何,它有效!