pandas 根据空值的百分比删除熊猫数据框中的列

Question

提问by user2656075

I have a dataframe with around 60 columns and 2 million rows. Some of the columns are mostly empty. I calculated the % of null values in each column using this function.

我有一个大约有 60 列和 200 万行的数据框。一些列大多是空的。我使用此函数计算了每列中空值的百分比。

def missing_values_table(df): 
    mis_val = df.isnull().sum()
    mis_val_percent = 100 * df.isnull().sum()/len(df)
    mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)
    mis_val_table_ren_columns = mis_val_table.rename(
    columns = {0 : 'Missing Values', 1 : '% of Total Values'})
    return mis_val_table_ren_columns

Now I want to drop the columns that have more than 80%(for example) values missing. I tried the following code but it does not seem to be working.

现在我想删除缺少超过 80%（例如）值的列。我尝试了以下代码，但似乎不起作用。

df = df.drop(df.columns[df.apply(lambda col: col.isnull().sum()/len(df) > 0.80)], axis=1)

Thank you in advance. Hope I'm not missing something very basic

先感谢您。希望我没有遗漏一些非常基本的东西

I receive this error

我收到此错误

TypeError: ("'generator' object is not callable", u'occurred at index Unique_Key')

Answer 1

回答by Vaishali

You can use dropna() with threshold parameter

您可以将 dropna() 与阈值参数一起使用

thresh = len(df) * .2
df.dropna(thresh = thresh, axis = 1, inplace = True)

Answer 2

回答by Frederico Guerra

def missing_values(df, percentage):

    columns = df.columns
    percent_missing = df.isnull().sum() * 100 / len(df)
    missing_value_df = pd.DataFrame({'column_name': columns,
                                 'percent_missing': percent_missing})

    missing_drop = list(missing_value_df[missing_value_df.percent_missing>percentage].column_name)
    df = df.drop(missing_drop, axis=1)
    return df

pandas 根据空值的百分比删除熊猫数据框中的列

提问by user2656075

回答by Vaishali

回答by Frederico Guerra

相关推荐

最近更新

标签

pandas 根据空值的百分比删除熊猫数据框中的列

提问by user2656075

回答by Vaishali

回答by Frederico Guerra

相关推荐

pandas 将字符串转换为浮动熊猫

pandas 用数组替换熊猫列值

pandas cut：如何将分类标签转换为字符串（否则无法导出到 Excel）？

pandas Python - 从字符串中删除小数和零

相关推荐

最近更新

标签