pandas 在数据框中查找空值的有效方法

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/39421433/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:59:26  来源:igfitidea点击:

Efficient way to find null values in a dataframe

pythonpandasnumpy

提问by user2578013

import pandas as pd
import numpy as np

df = pd.read_csv ('file',low_memory=False)

df_null = df.isnull()
mask = (df_null == True)
i, j = np.where(mask)
print (list(zip(df_null.columns[j], df['Column1'][i])))

This is what I currently have. Essentially, I've created two dataframes and from there using the index of the null value, picked the corresponding value in Column A.

这就是我目前所拥有的。本质上,我已经创建了两个数据框,然后使用空值的索引从那里选择了 A 列中的相应值。

The ask is if there is a more efficient and faster way of doing this using Dataframes, which I admit, I don't know too well.

问题是是否有使用 Dataframes 执行此操作的更有效和更快的方法,我承认,我不太清楚。

回答by shawnheide

A routine that I normally use in pandas to identify null counts by columns is the following:

我通常在 Pandas 中用于按列识别空计数的例程如下:

import pandas as pd

df = pd.read_csv("test.csv")

null_counts = df.isnull().sum()
null_counts[null_counts > 0].sort_values(ascending=False)

This will print the columns that have null values along with sorting each column by the number of null values that it has.

这将打印具有空值的列,并按每列具有的空值数量对每列进行排序。

Example output:

示例输出:

PoolQC          1453
MiscFeature     1406
Alley           1369
Fence           1179
FireplaceQu      690
LotFrontage      259
GarageYrBlt       81
GarageType        81
GarageFinish      81
GarageQual        81
GarageCond        81
BsmtFinType2      38
BsmtExposure      38
BsmtFinType1      37
BsmtCond          37
BsmtQual          37
MasVnrArea         8
MasVnrType         8
Electrical         1
dtype: int64