Python 如何计算 Pandas DataFrame 中的 nan 值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34537048/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 15:08:04  来源:igfitidea点击:

How to count nan values in a pandas DataFrame?

pythonpython-3.xpandasdataframenan

提问by SpeedCoder5

What is the best way to account for (not a number) nan values in a pandas DataFrame?

在 Pandas DataFrame 中考虑(不是数字)nan 值的最佳方法是什么?

The following code:

以下代码:

import numpy as np
import pandas as pd
dfd = pd.DataFrame([1, np.nan, 3, 3, 3, np.nan], columns=['a'])
dfv = dfd.a.value_counts().sort_index()
print("nan: %d" % dfv[np.nan].sum())
print("1: %d" % dfv[1].sum())
print("3: %d" % dfv[3].sum())
print("total: %d" % dfv[:].sum())

Outputs:

输出:

nan: 0
1: 1
3: 3
total: 4

While the desired output is:

虽然所需的输出是:

nan: 2
1: 1
3: 3
total: 6

I am using pandas 0.17 with Python 3.5.0 with Anaconda 2.4.0.

我将 Pandas 0.17 与 Python 3.5.0 与 Anaconda 2.4.0 一起使用。

采纳答案by Alex Riley

If you want to count only NaN values in column 'a'of a DataFrame df, use:

如果您只想计算'a'DataFrame列中的NaN 值df,请使用:

len(df) - df['a'].count()

Here count()tells us the number of non-NaN values, and this is subtracted from the total number of values (given by len(df)).

这里count()告诉我们非 NaN 值的数量,这是从值的总数中减去(由 给出len(df))。

To count NaN values in everycolumn of df, use:

要计算 的每一列中的NaN 值df,请使用:

len(df) - df.count()


If you want to use value_counts, tell it notto drop NaN values by setting dropna=False(added in 0.14.1):

如果要使用value_counts,请通过设置(在0.14.1 中添加)告诉它不要删除 NaN 值:dropna=False

dfv = dfd['a'].value_counts(dropna=False)

This allows the missing values in the column to be counted too:

这也允许计算列中的缺失值:

 3     3
NaN    2
 1     1
Name: a, dtype: int64

The rest of your code should then work as you expect (note that it's not necessary to call sum; just print("nan: %d" % dfv[np.nan])suffices).

然后您的其余代码应该按您的预期工作(请注意,没有必要调用sum; 就print("nan: %d" % dfv[np.nan])足够了)。

回答by ilyas patanam

To count just null values, you can use isnull():

要仅计算空值,您可以使用isnull()

In [11]:
dfd.isnull().sum()

Out[11]:
a    2
dtype: int64

Here ais the column name, and there are 2 occurrences of the null value in the column.

a是列名,列中出现了 2 次空值。

回答by Thom Ives

A good clean way to count all NaN's in all columns of your dataframe would be ...

计算数据帧所有列中所有 NaN 的好方法是......

import pandas as pd 
import numpy as np


df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})
print(df.isna().sum().sum())

Using a single sum, you get the count of NaN's for each column. The second sum, sums those column sums.

使用单个总和,您可以获得每列的 NaN 计数。第二个总和,将这些列总和相加。

回答by shuishoudage

if you only want the summary of null value for each column, using the following code df.isnull().sum()if you want to know how many null values in the data frame using following code df.isnull().sum().sum() # calculate total

如果您只想要每列的空值摘要,请使用以下代码df.isnull().sum()如果您想使用以下代码 知道数据框中有多少空值 df.isnull().sum().sum() # calculate total

回答by Mr_and_Mrs_D

Yet another way to count allthe nans in a df:

另一种计算df 中所有nan 的方法:

num_nans = df.size - df.count().sum()

num_nans = df.size - df.count().sum()

Timings:

时间:

import timeit

import numpy as np
import pandas as pd

df_scale = 100000
df = pd.DataFrame(
    [[1, np.nan, 100, 63], [2, np.nan, 101, 63], [2, 12, 102, 63],
     [2, 14, 102, 63], [2, 14, 102, 64], [1, np.nan, 200, 63]] * df_scale,
    columns=['group', 'value', 'value2', 'dummy'])

repeat = 3
numbers = 100

setup = """import pandas as pd
from __main__ import df
"""

def timer(statement, _setup=None):
    print (min(
        timeit.Timer(statement, setup=_setup or setup).repeat(
            repeat, numbers)))

timer('df.size - df.count().sum()')
timer('df.isna().sum().sum()')
timer('df.isnull().sum().sum()')

prints:

印刷:

3.998805362999999
3.7503365439999996
3.689461442999999

so pretty much equivalent

非常等价