pandas 在数据框中查找空值的有效方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/39421433/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Efficient way to find null values in a dataframe
提问by user2578013
import pandas as pd
import numpy as np
df = pd.read_csv ('file',low_memory=False)
df_null = df.isnull()
mask = (df_null == True)
i, j = np.where(mask)
print (list(zip(df_null.columns[j], df['Column1'][i])))
This is what I currently have. Essentially, I've created two dataframes and from there using the index of the null value, picked the corresponding value in Column A.
这就是我目前所拥有的。本质上,我已经创建了两个数据框,然后使用空值的索引从那里选择了 A 列中的相应值。
The ask is if there is a more efficient and faster way of doing this using Dataframes, which I admit, I don't know too well.
问题是是否有使用 Dataframes 执行此操作的更有效和更快的方法,我承认,我不太清楚。
回答by shawnheide
A routine that I normally use in pandas to identify null counts by columns is the following:
我通常在 Pandas 中用于按列识别空计数的例程如下:
import pandas as pd
df = pd.read_csv("test.csv")
null_counts = df.isnull().sum()
null_counts[null_counts > 0].sort_values(ascending=False)
This will print the columns that have null values along with sorting each column by the number of null values that it has.
这将打印具有空值的列,并按每列具有的空值数量对每列进行排序。
Example output:
示例输出:
PoolQC 1453
MiscFeature 1406
Alley 1369
Fence 1179
FireplaceQu 690
LotFrontage 259
GarageYrBlt 81
GarageType 81
GarageFinish 81
GarageQual 81
GarageCond 81
BsmtFinType2 38
BsmtExposure 38
BsmtFinType1 37
BsmtCond 37
BsmtQual 37
MasVnrArea 8
MasVnrType 8
Electrical 1
dtype: int64