Python 如何计算 Pandas DataFrame 中的 nan 值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34537048/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to count nan values in a pandas DataFrame?
提问by SpeedCoder5
What is the best way to account for (not a number) nan values in a pandas DataFrame?
在 Pandas DataFrame 中考虑(不是数字)nan 值的最佳方法是什么?
The following code:
以下代码:
import numpy as np
import pandas as pd
dfd = pd.DataFrame([1, np.nan, 3, 3, 3, np.nan], columns=['a'])
dfv = dfd.a.value_counts().sort_index()
print("nan: %d" % dfv[np.nan].sum())
print("1: %d" % dfv[1].sum())
print("3: %d" % dfv[3].sum())
print("total: %d" % dfv[:].sum())
Outputs:
输出:
nan: 0
1: 1
3: 3
total: 4
While the desired output is:
虽然所需的输出是:
nan: 2
1: 1
3: 3
total: 6
I am using pandas 0.17 with Python 3.5.0 with Anaconda 2.4.0.
我将 Pandas 0.17 与 Python 3.5.0 与 Anaconda 2.4.0 一起使用。
采纳答案by Alex Riley
If you want to count only NaN values in column 'a'of a DataFrame df, use:
如果您只想计算'a'DataFrame列中的NaN 值df,请使用:
len(df) - df['a'].count()
Here count()tells us the number of non-NaN values, and this is subtracted from the total number of values (given by len(df)).
这里count()告诉我们非 NaN 值的数量,这是从值的总数中减去(由 给出len(df))。
To count NaN values in everycolumn of df, use:
要计算 的每一列中的NaN 值df,请使用:
len(df) - df.count()
If you want to use value_counts, tell it notto drop NaN values by setting dropna=False(added in 0.14.1):
如果要使用value_counts,请通过设置(在0.14.1 中添加)告诉它不要删除 NaN 值:dropna=False
dfv = dfd['a'].value_counts(dropna=False)
This allows the missing values in the column to be counted too:
这也允许计算列中的缺失值:
3 3
NaN 2
1 1
Name: a, dtype: int64
The rest of your code should then work as you expect (note that it's not necessary to call sum; just print("nan: %d" % dfv[np.nan])suffices).
然后您的其余代码应该按您的预期工作(请注意,没有必要调用sum; 就print("nan: %d" % dfv[np.nan])足够了)。
回答by ilyas patanam
回答by Thom Ives
A good clean way to count all NaN's in all columns of your dataframe would be ...
计算数据帧所有列中所有 NaN 的好方法是......
import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})
print(df.isna().sum().sum())
Using a single sum, you get the count of NaN's for each column. The second sum, sums those column sums.
使用单个总和,您可以获得每列的 NaN 计数。第二个总和,将这些列总和相加。
回答by shuishoudage
if you only want the summary of null value for each column, using the following code
df.isnull().sum()if you want to know how many null values in the data frame using following code
df.isnull().sum().sum() # calculate total
如果您只想要每列的空值摘要,请使用以下代码df.isnull().sum()如果您想使用以下代码
知道数据框中有多少空值
df.isnull().sum().sum() # calculate total
回答by Mr_and_Mrs_D
Yet another way to count allthe nans in a df:
另一种计算df 中所有nan 的方法:
num_nans = df.size - df.count().sum()
num_nans = df.size - df.count().sum()
Timings:
时间:
import timeit
import numpy as np
import pandas as pd
df_scale = 100000
df = pd.DataFrame(
[[1, np.nan, 100, 63], [2, np.nan, 101, 63], [2, 12, 102, 63],
[2, 14, 102, 63], [2, 14, 102, 64], [1, np.nan, 200, 63]] * df_scale,
columns=['group', 'value', 'value2', 'dummy'])
repeat = 3
numbers = 100
setup = """import pandas as pd
from __main__ import df
"""
def timer(statement, _setup=None):
print (min(
timeit.Timer(statement, setup=_setup or setup).repeat(
repeat, numbers)))
timer('df.size - df.count().sum()')
timer('df.isna().sum().sum()')
timer('df.isnull().sum().sum()')
prints:
印刷:
3.998805362999999
3.7503365439999996
3.689461442999999
so pretty much equivalent
非常等价

