Python 如何计算pandas DataFrame中列中的NaN值

Question

提问by user3799307

I have data, in which I want to find number of NaN, so that if it is less than some threshold, I will drop this columns. I looked, but didn't able to find any function for this. there is value_counts, but it would be slow for me, because most of values are distinct and I want count of NaNonly.

我有数据，我想在其中找到 number of NaN，以便如果它小于某个阈值，我将删除此列。我看了看，但找不到任何功能。有value_counts，但对我来说会很慢，因为大多数值都是不同的，我只想计数NaN。

Answer 1

回答by elyase

You could subtract the total length from the countof non-nan values:

您可以从非 nan 值的计数中减去总长度：

count_nan = len(df) - df.count()

You should time it on your data. For small Series got a 3x speed up in comparison with the isnullsolution.

你应该根据你的数据计时。对于小型系列，与isnull解决方案相比，速度提高了 3 倍。

Answer 2

回答by joris

You can use the isna()method (or it's alias isnull()which is also compatible with older pandas versions < 0.21.0) and then sum to count the NaN values. For one column:

您可以使用该isna()方法（或者它的别名isnull()也与较旧的熊猫版本 <0.21.0 兼容），然后求和来计算 NaN 值。对于一列：

In [1]: s = pd.Series([1,2,3, np.nan, np.nan])

In [4]: s.isna().sum()   # or s.isnull().sum() for older pandas versions
Out[4]: 2

For several columns, it also works:

对于几列，它也适用：

In [5]: df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})

In [6]: df.isna().sum()
Out[6]:
a    1
b    2
dtype: int64

Answer 3

回答by K.-Michael Aye

Since pandas 0.14.1 my suggestion hereto have a keyword argument in the value_counts method has been implemented:

从 pandas 0.14.1 开始，我建议在value_counts 方法中有一个关键字参数已经实现：

import pandas as pd
df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})
for col in df:
    print df[col].value_counts(dropna=False)

2     1
 1     1
NaN    1
dtype: int64
NaN    2
 1     1
dtype: int64

Answer 4

回答by Manoj Kumar

if you are using Jupyter Notebook, How about....

如果您使用的是 Jupyter Notebook，那么....

 %%timeit
 df.isnull().any().any()

or

或者

 %timeit 
 df.isnull().values.sum()

or, are there anywhere NaNs in the data, if yes, where?

或者，数据中是否有 NaN，如果有，在哪里？

 df.isnull().any()

Answer 5

回答by Nikos Tavoularis

Based on the most voted answer we can easily define a function that gives us a dataframe to preview the missing values and the % of missing values in each column:

根据投票最多的答案，我们可以轻松定义一个函数，该函数为我们提供一个数据框来预览每列中的缺失值和缺失值百分比：

def missing_values_table(df):
        mis_val = df.isnull().sum()
        mis_val_percent = 100 * df.isnull().sum() / len(df)
        mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)
        mis_val_table_ren_columns = mis_val_table.rename(
        columns = {0 : 'Missing Values', 1 : '% of Total Values'})
        mis_val_table_ren_columns = mis_val_table_ren_columns[
            mis_val_table_ren_columns.iloc[:,1] != 0].sort_values(
        '% of Total Values', ascending=False).round(1)
        print ("Your selected dataframe has " + str(df.shape[1]) + " columns.\n"      
            "There are " + str(mis_val_table_ren_columns.shape[0]) +
              " columns that have missing values.")
        return mis_val_table_ren_columns

Answer 6

回答by sushmit

if its just counting nan values in a pandas column here is a quick way

如果它只是在 Pandas 列中计算 nan 值是一种快速的方法

import pandas as pd
## df1 as an example data frame 
## col1 name of column for which you want to calculate the nan values
sum(pd.isnull(df1['col1']))

Answer 7

回答by vsdaking

Used the solution proposed by @sushmit in my code.

在我的代码中使用了@sushmit 提出的解决方案。

A possible variation of the same can also be

相同的可能变化也可以是

colNullCnt = []
for z in range(len(df1.cols)):
    colNullCnt.append([df1.cols[z], sum(pd.isnull(trainPd[df1.cols[z]]))])

Advantage of this is that it returns the result for each of the columns in the df henceforth.

这样做的好处是它返回 df 中每一列的结果。

Answer 8

回答by Itachi

You can use value_counts method and print values of np.nan

您可以使用 value_counts 方法并打印 np.nan 的值

s.value_counts(dropna = False)[np.nan]

Answer 9

回答by Esptheitroad Murhabazi

based to the answer that was given and some improvements this is my approach

根据给出的答案和一些改进，这是我的方法

def PercentageMissin(Dataset):
    """this function will return the percentage of missing values in a dataset """
    if isinstance(Dataset,pd.DataFrame):
        adict={} #a dictionary conatin keys columns names and values percentage of missin value in the columns
        for col in Dataset.columns:
            adict[col]=(np.count_nonzero(Dataset[col].isnull())*100)/len(Dataset[col])
        return pd.DataFrame(adict,index=['% of missing'],columns=adict.keys())
    else:
        raise TypeError("can only be used with panda dataframe")

Answer 10

回答by Naveen Bharadwaj

df1.isnull().sum()

This will do the trick.

这将解决问题。

Python 如何计算pandas DataFrame中列中的NaN值

提问by user3799307

回答by elyase

回答by joris

回答by K.-Michael Aye

回答by Manoj Kumar

回答by Nikos Tavoularis

回答by sushmit

回答by vsdaking

回答by Itachi

回答by Esptheitroad Murhabazi

回答by Naveen Bharadwaj

相关推荐

最近更新

标签

Python 如何计算pandas DataFrame中列中的NaN值

提问by user3799307

回答by elyase

回答by joris

回答by K.-Michael Aye

回答by Manoj Kumar

回答by Nikos Tavoularis

回答by sushmit

回答by vsdaking

回答by Itachi

回答by Esptheitroad Murhabazi

回答by Naveen Bharadwaj

相关推荐

如何在python中制作带有按钮的窗口

如何将 wxPython 用于 Python 3？

Python - 将字符串打印到屏幕，在输出中包含 \n

Python 尾递归斐波那契

相关推荐

最近更新

标签