pandas 通过从熊猫数据框中的非缺失值中随机选择来填充缺失数据

Question

提问by Donald Gedeon

I have a pandas data frame where there are a several missing values. I noticed that the non missing values are close to each other. Thus, I would like to impute the missing values by randomly choosing the non missing values.

我有一个Pandas数据框，其中有几个缺失值。我注意到非缺失值彼此接近。因此，我想通过随机选择非缺失值来估算缺失值。

For instance:

例如：

import pandas as pd
import random
import numpy as np

foo = pd.DataFrame({'A': [2, 3, np.nan, 5, np.nan], 'B':[np.nan, 4, 2, np.nan, 5]})
foo
    A   B
0   2 NaN
1   3   4
2 NaN   2   
3   5 NaN
4 NaN   5

I would like for instance foo['A'][2]=2and foo['A'][5]=3The shape of my pandas DataFrame is (6940,154). I try this

例如foo['A'][2]=2，我想foo['A'][5]=3我的Pandas数据帧的形状是 (6940,154)。我试试这个

foo['A'] = foo['A'].fillna(random.choice(foo['A'].values.tolist()))

But it not working. Could you help me achieve that? Best regards.

但它不起作用。你能帮我实现吗？此致。

Answer 1

回答by bamdan

You can use pandas.fillna method and the random.choice method to fill the missing values with a random selection of a particular column.

您可以使用 pandas.fillna 方法和 random.choice 方法通过随机选择特定列来填充缺失值。

import random
import numpy as np

df["column"].fillna(lambda x: random.choice(df[df[column] != np.nan]["column"]), inplace =True)

Where column is the column you want to fill with non nan values randomly.

其中 column 是您要随机填充非 nan 值的列。

Answer 2

回答by Esptheitroad Murhabazi

This is another approach to this question after making improvement on the first answer and according to how to check if an numpy int is nand found here in numpy documentation

这是根据使得改进的第一答案和之后的另一种方法对这个问题如何检查如果numpy的int值NAND发现这里numpy的文件中

foo['A'].apply(lambda x: np.random.choice([x for x in range(min(foo['A']),max(foo['A'])]) if (np.isnan(x)) else x)

Answer 3

回答by Karolis

This works well for me on Pandas DataFrame

这在 Pandas DataFrame 上对我很有效

def randomiseMissingData(df2):
    "randomise missing data for DataFrame (within a column)"
    df = df2.copy()
    for col in df.columns:
        data = df[col]
        mask = data.isnull()
        samples = random.choices( data[~mask].values , k = mask.sum() )
        data[mask] = samples

return df

Answer 4

回答by peralmq

Here is another Pandas DataFrame approach

这是另一种 Pandas DataFrame 方法

import numpy as np
def fill_with_random(df2, column):
    '''Fill `df2`'s column with name `column` with random data based on non-NaN data from `column`'''
    df = df2.copy()
    df[column] = df[column].apply(lambda x: np.random.choice(df[column].dropna().values) if np.isnan(x) else x)
    return df

pandas 通过从熊猫数据框中的非缺失值中随机选择来填充缺失数据

提问by Donald Gedeon

回答by bamdan

回答by Esptheitroad Murhabazi

回答by Karolis

回答by peralmq

相关推荐

最近更新

标签

pandas 通过从熊猫数据框中的非缺失值中随机选择来填充缺失数据

提问by Donald Gedeon

回答by bamdan

回答by Esptheitroad Murhabazi

回答by Karolis

回答by peralmq

相关推荐

使用 Pandas 和 Group By 绘制堆叠直方图

pandas pandas数据框中的值组合

pandas 使用局部加权回归（LOESS/LOWESS）预测新数据

使用 Pandas 为 Scikit-Learn 准备 CSV 文件数据？

相关推荐

最近更新

标签