如何按“pandas”中的列获取缺失/NaN 数据的汇总计数？

Question

提问by orome

In RI can quickly see a count of missing data using the summarycommand, but the equivalent pandasDataFrame method, describedoes not report these values.

在R 中，我可以使用summary命令快速查看丢失数据的计数，但等效的pandasDataFrame 方法describe不会报告这些值。

I gather I can do something like

我想我可以做类似的事情

len(mydata.index) - mydata.count()

to compute the number of missing values for each column, but I wonder if there's a better idiom (or if my approach is even right).

计算每列缺失值的数量，但我想知道是否有更好的习惯用法（或者我的方法是否正确）。

Answer 1

回答by Jeff

Both describeand inforeport the count of non-missing values.

双方describe并info上报非缺失值的计数。

In [1]: df = DataFrame(np.random.randn(10,2))

In [2]: df.iloc[3:6,0] = np.nan

In [3]: df
Out[3]: 
          0         1
0 -0.560342  1.862640
1 -1.237742  0.596384
2  0.603539 -1.561594
3       NaN  3.018954
4       NaN -0.046759
5       NaN  0.480158
6  0.113200 -0.911159
7  0.990895  0.612990
8  0.668534 -0.701769
9 -0.607247 -0.489427

[10 rows x 2 columns]

In [4]: df.describe()
Out[4]: 
              0          1
count  7.000000  10.000000
mean  -0.004166   0.286042
std    0.818586   1.363422
min   -1.237742  -1.561594
25%   -0.583795  -0.648684
50%    0.113200   0.216699
75%    0.636036   0.608839
max    0.990895   3.018954

[8 rows x 2 columns]


In [5]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 2 columns):
0    7 non-null float64
1    10 non-null float64
dtypes: float64(2)

To get a count of missing, your soln is correct

要计算丢失的次数，您的解决方案是正确的

In [20]: len(df.index)-df.count()
Out[20]: 
0    3
1    0
dtype: int64

You could do this too

你也可以这样做

In [23]: df.isnull().sum()
Out[23]: 
0    3
1    0
dtype: int64

Answer 2

回答by Ricky McMaster

As a tiny addition, to get percentage missing by DataFrame column, combining @Jeff and @userS's answers above gets you:

作为一个小小的补充，要获得 DataFrame 列缺少的百分比，结合上面@Jeff 和@userS 的答案可以得到：

df.isnull().sum()/len(df)*100

Answer 3

回答by userS

This isnt quite a full summary, but it will give you a quick sense of your column level data

这不是一个完整的摘要，但它会让您快速了解您的列级数据

def getPctMissing(series):
    num = series.isnull().sum()
    den = series.count()
    return 100*(num/den)

Answer 4

回答by Drafter250

I can't make comments yet but to add on to Jeff's answer but if you didn't care which columns had Nan's and you just wanted to check overall just add a second .sum() to get a single value.

我还不能发表评论，但要补充 Jeff 的答案，但如果您不关心哪些列有 Nan 并且您只想检查整体，只需添加第二个 .sum() 以获得单个值。

result = df.isnull().sum().sum()
result > 0

a Series would only need one .sum() and a Panel() would need three

一个系列只需要一个 .sum() 而一个 Panel() 需要三个

Answer 5

回答by Kshitij

Following one will do the trick and will return counts of nulls for every column:

下面的一个将解决这个问题，并将返回每一列的空值计数：

df.isnull().sum(axis=0)

df.isnull()returns a dataframe with True / False values
sum(axis=0)sums the values across all rows for a column

df.isnull()返回一个带有 True / False 值的数据框，
sum(axis=0)将一列的所有行的值相加

如何按“pandas”中的列获取缺失/NaN 数据的汇总计数？

提问by orome

回答by Jeff

回答by Ricky McMaster

回答by userS

回答by Drafter250

回答by Kshitij

相关推荐

最近更新

标签

如何按“pandas”中的列获取缺失/NaN 数据的汇总计数？

提问by orome

回答by Jeff

回答by Ricky McMaster

回答by userS

回答by Drafter250

回答by Kshitij

相关推荐

pandas 在python中从yahoo金融自动下载历史股票价格

pandas 熊猫数据框乘以一个系列

pandas 使用财务数据计算数据帧的回报

带有包含空格的列名的 Pandas 列访问

相关推荐

最近更新

标签