pandas 根据空值的百分比删除熊猫数据框中的列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46939314/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Drop columns in a pandas dataframe based on the % of null values
提问by user2656075
I have a dataframe with around 60 columns and 2 million rows. Some of the columns are mostly empty. I calculated the % of null values in each column using this function.
我有一个大约有 60 列和 200 万行的数据框。一些列大多是空的。我使用此函数计算了每列中空值的百分比。
def missing_values_table(df):
mis_val = df.isnull().sum()
mis_val_percent = 100 * df.isnull().sum()/len(df)
mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)
mis_val_table_ren_columns = mis_val_table.rename(
columns = {0 : 'Missing Values', 1 : '% of Total Values'})
return mis_val_table_ren_columns
Now I want to drop the columns that have more than 80%(for example) values missing. I tried the following code but it does not seem to be working.
现在我想删除缺少超过 80%(例如)值的列。我尝试了以下代码,但似乎不起作用。
df = df.drop(df.columns[df.apply(lambda col: col.isnull().sum()/len(df) > 0.80)], axis=1)
Thank you in advance. Hope I'm not missing something very basic
先感谢您。希望我没有遗漏一些非常基本的东西
I receive this error
我收到此错误
TypeError: ("'generator' object is not callable", u'occurred at index Unique_Key')
TypeError: ("'generator' object is not callable", u'occurred at index Unique_Key')
回答by Vaishali
You can use dropna() with threshold parameter
您可以将 dropna() 与阈值参数一起使用
thresh = len(df) * .2
df.dropna(thresh = thresh, axis = 1, inplace = True)
回答by Frederico Guerra
def missing_values(df, percentage):
columns = df.columns
percent_missing = df.isnull().sum() * 100 / len(df)
missing_value_df = pd.DataFrame({'column_name': columns,
'percent_missing': percent_missing})
missing_drop = list(missing_value_df[missing_value_df.percent_missing>percentage].column_name)
df = df.drop(missing_drop, axis=1)
return df