Python 如何计算pandas DataFrame中列中的NaN值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26266362/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to count the NaN values in a column in pandas DataFrame
提问by user3799307
I have data, in which I want to find number of NaN, so that if it is less than some threshold, I will drop this columns. I looked, but didn't able to find any function for this. there is value_counts, but it would be slow for me, because most of values are distinct and I want count of NaNonly.
我有数据,我想在其中找到 number of NaN,以便如果它小于某个阈值,我将删除此列。我看了看,但找不到任何功能。有value_counts,但对我来说会很慢,因为大多数值都是不同的,我只想计数NaN。
回答by elyase
回答by joris
You can use the isna()method (or it's alias isnull()which is also compatible with older pandas versions < 0.21.0) and then sum to count the NaN values. For one column:
您可以使用该isna()方法(或者它的别名isnull()也与较旧的熊猫版本 <0.21.0 兼容),然后求和来计算 NaN 值。对于一列:
In [1]: s = pd.Series([1,2,3, np.nan, np.nan])
In [4]: s.isna().sum() # or s.isnull().sum() for older pandas versions
Out[4]: 2
For several columns, it also works:
对于几列,它也适用:
In [5]: df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})
In [6]: df.isna().sum()
Out[6]:
a 1
b 2
dtype: int64
回答by K.-Michael Aye
Since pandas 0.14.1 my suggestion hereto have a keyword argument in the value_counts method has been implemented:
从 pandas 0.14.1 开始,我建议在value_counts 方法中有一个关键字参数已经实现:
import pandas as pd
df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})
for col in df:
print df[col].value_counts(dropna=False)
2 1
1 1
NaN 1
dtype: int64
NaN 2
1 1
dtype: int64
回答by Manoj Kumar
if you are using Jupyter Notebook, How about....
如果您使用的是 Jupyter Notebook,那么....
%%timeit
df.isnull().any().any()
or
或者
%timeit
df.isnull().values.sum()
or, are there anywhere NaNs in the data, if yes, where?
或者,数据中是否有 NaN,如果有,在哪里?
df.isnull().any()
回答by Nikos Tavoularis
Based on the most voted answer we can easily define a function that gives us a dataframe to preview the missing values and the % of missing values in each column:
根据投票最多的答案,我们可以轻松定义一个函数,该函数为我们提供一个数据框来预览每列中的缺失值和缺失值百分比:
def missing_values_table(df):
mis_val = df.isnull().sum()
mis_val_percent = 100 * df.isnull().sum() / len(df)
mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)
mis_val_table_ren_columns = mis_val_table.rename(
columns = {0 : 'Missing Values', 1 : '% of Total Values'})
mis_val_table_ren_columns = mis_val_table_ren_columns[
mis_val_table_ren_columns.iloc[:,1] != 0].sort_values(
'% of Total Values', ascending=False).round(1)
print ("Your selected dataframe has " + str(df.shape[1]) + " columns.\n"
"There are " + str(mis_val_table_ren_columns.shape[0]) +
" columns that have missing values.")
return mis_val_table_ren_columns
回答by sushmit
if its just counting nan values in a pandas column here is a quick way
如果它只是在 Pandas 列中计算 nan 值是一种快速的方法
import pandas as pd
## df1 as an example data frame
## col1 name of column for which you want to calculate the nan values
sum(pd.isnull(df1['col1']))
回答by vsdaking
Used the solution proposed by @sushmit in my code.
在我的代码中使用了@sushmit 提出的解决方案。
A possible variation of the same can also be
相同的可能变化也可以是
colNullCnt = []
for z in range(len(df1.cols)):
colNullCnt.append([df1.cols[z], sum(pd.isnull(trainPd[df1.cols[z]]))])
Advantage of this is that it returns the result for each of the columns in the df henceforth.
这样做的好处是它返回 df 中每一列的结果。
回答by Itachi
You can use value_counts method and print values of np.nan
您可以使用 value_counts 方法并打印 np.nan 的值
s.value_counts(dropna = False)[np.nan]
回答by Esptheitroad Murhabazi
based to the answer that was given and some improvements this is my approach
根据给出的答案和一些改进,这是我的方法
def PercentageMissin(Dataset):
"""this function will return the percentage of missing values in a dataset """
if isinstance(Dataset,pd.DataFrame):
adict={} #a dictionary conatin keys columns names and values percentage of missin value in the columns
for col in Dataset.columns:
adict[col]=(np.count_nonzero(Dataset[col].isnull())*100)/len(Dataset[col])
return pd.DataFrame(adict,index=['% of missing'],columns=adict.keys())
else:
raise TypeError("can only be used with panda dataframe")
回答by Naveen Bharadwaj
df1.isnull().sum()
This will do the trick.
这将解决问题。

