Python 如何计算pandas DataFrame中列中的NaN值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26266362/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:17:46  来源:igfitidea点击:

How to count the NaN values in a column in pandas DataFrame

pythonpandasdataframe

提问by user3799307

I have data, in which I want to find number of NaN, so that if it is less than some threshold, I will drop this columns. I looked, but didn't able to find any function for this. there is value_counts, but it would be slow for me, because most of values are distinct and I want count of NaNonly.

我有数据,我想在其中找到 number of NaN,以便如果它小于某个阈值,我将删除此列。我看了看,但找不到任何功能。有value_counts,但对我来说会很慢,因为大多数值都是不同的,我只想计数NaN

回答by elyase

You could subtract the total length from the countof non-nan values:

您可以从非 nan 值的计数中减去总长度:

count_nan = len(df) - df.count()

You should time it on your data. For small Series got a 3x speed up in comparison with the isnullsolution.

你应该根据你的数据计时。对于小型系列,与isnull解决方案相比,速度提高了 3 倍。

回答by joris

You can use the isna()method (or it's alias isnull()which is also compatible with older pandas versions < 0.21.0) and then sum to count the NaN values. For one column:

您可以使用该isna()方法(或者它的别名isnull()也与较旧的熊猫版本 <0.21.0 兼容),然后求和来计算 NaN 值。对于一列:

In [1]: s = pd.Series([1,2,3, np.nan, np.nan])

In [4]: s.isna().sum()   # or s.isnull().sum() for older pandas versions
Out[4]: 2

For several columns, it also works:

对于几列,它也适用:

In [5]: df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})

In [6]: df.isna().sum()
Out[6]:
a    1
b    2
dtype: int64

回答by K.-Michael Aye

Since pandas 0.14.1 my suggestion hereto have a keyword argument in the value_counts method has been implemented:

从 pandas 0.14.1 开始,我建议value_counts 方法中有一个关键字参数已经实现:

import pandas as pd
df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})
for col in df:
    print df[col].value_counts(dropna=False)

2     1
 1     1
NaN    1
dtype: int64
NaN    2
 1     1
dtype: int64

回答by Manoj Kumar

if you are using Jupyter Notebook, How about....

如果您使用的是 Jupyter Notebook,那么....

 %%timeit
 df.isnull().any().any()

or

或者

 %timeit 
 df.isnull().values.sum()

or, are there anywhere NaNs in the data, if yes, where?

或者,数据中是否有 NaN,如果有,在哪里?

 df.isnull().any()

回答by Nikos Tavoularis

Based on the most voted answer we can easily define a function that gives us a dataframe to preview the missing values and the % of missing values in each column:

根据投票最多的答案,我们可以轻松定义一个函数,该函数为我们提供一个数据框来预览每列中的缺失值和缺失值百分比:

def missing_values_table(df):
        mis_val = df.isnull().sum()
        mis_val_percent = 100 * df.isnull().sum() / len(df)
        mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)
        mis_val_table_ren_columns = mis_val_table.rename(
        columns = {0 : 'Missing Values', 1 : '% of Total Values'})
        mis_val_table_ren_columns = mis_val_table_ren_columns[
            mis_val_table_ren_columns.iloc[:,1] != 0].sort_values(
        '% of Total Values', ascending=False).round(1)
        print ("Your selected dataframe has " + str(df.shape[1]) + " columns.\n"      
            "There are " + str(mis_val_table_ren_columns.shape[0]) +
              " columns that have missing values.")
        return mis_val_table_ren_columns

回答by sushmit

if its just counting nan values in a pandas column here is a quick way

如果它只是在 Pandas 列中计算 nan 值是一种快速的方法

import pandas as pd
## df1 as an example data frame 
## col1 name of column for which you want to calculate the nan values
sum(pd.isnull(df1['col1']))

回答by vsdaking

Used the solution proposed by @sushmit in my code.

在我的代码中使用了@sushmit 提出的解决方案。

A possible variation of the same can also be

相同的可能变化也可以是

colNullCnt = []
for z in range(len(df1.cols)):
    colNullCnt.append([df1.cols[z], sum(pd.isnull(trainPd[df1.cols[z]]))])

Advantage of this is that it returns the result for each of the columns in the df henceforth.

这样做的好处是它返回 df 中每一列的结果。

回答by Itachi

You can use value_counts method and print values of np.nan

您可以使用 value_counts 方法并打印 np.nan 的值

s.value_counts(dropna = False)[np.nan]

回答by Esptheitroad Murhabazi

based to the answer that was given and some improvements this is my approach

根据给出的答案和一些改进,这是我的方法

def PercentageMissin(Dataset):
    """this function will return the percentage of missing values in a dataset """
    if isinstance(Dataset,pd.DataFrame):
        adict={} #a dictionary conatin keys columns names and values percentage of missin value in the columns
        for col in Dataset.columns:
            adict[col]=(np.count_nonzero(Dataset[col].isnull())*100)/len(Dataset[col])
        return pd.DataFrame(adict,index=['% of missing'],columns=adict.keys())
    else:
        raise TypeError("can only be used with panda dataframe")

回答by Naveen Bharadwaj

df1.isnull().sum()

This will do the trick.

这将解决问题。