pandas 数据框按 nan 的数量删除列

Question

提问by pyan

I have a dataframe with some columns containing nan. I'd like to drop those columns with certain number of nan. For example, in the following code, I'd like to drop any column with 2 or more nan. In this case, column 'C' will be dropped and only 'A' and 'B' will be kept. How can I implement it?

我有一个数据框，其中一些列包含 nan。我想删除具有一定数量 nan 的那些列。例如，在下面的代码中，我想删除具有 2 个或更多 nan 的任何列。在这种情况下，将删除“C”列，仅保留“A”和“B”。我该如何实施？

import pandas as pd
import numpy as np

dff = pd.DataFrame(np.random.randn(10,3), columns=list('ABC'))
dff.iloc[3,0] = np.nan
dff.iloc[6,1] = np.nan
dff.iloc[5:8,2] = np.nan

print dff

Answer 1

回答by EdChum

There is a threshparam for dropna, you just need to pass the length of your df - the number of NaNvalues you want as your threshold:

有一个thresh参数dropna，您只需要传递 df 的长度 -NaN您想要作为阈值的值的数量：

In [13]:

dff.dropna(thresh=len(dff) - 2, axis=1)
Out[13]:
          A         B
0  0.517199 -0.806304
1 -0.643074  0.229602
2  0.656728  0.535155
3       NaN -0.162345
4 -0.309663 -0.783539
5  1.244725 -0.274514
6 -0.254232       NaN
7 -1.242430  0.228660
8 -0.311874 -0.448886
9 -0.984453 -0.755416

So the above will drop any column that does not meet the criteria of the length of the df (number of rows) - 2 as the number of non-Na values.

因此，上面将删除任何不符合 df (行数) - 2 长度标准的列作为非 Na 值的数量。

Answer 2

回答by Alexander

You can use a conditional list comprehension:

您可以使用条件列表理解：

>>> dff[[c for c in dff if dff[c].isnull().sum() < 2]]
          A         B
0 -0.819004  0.919190
1  0.922164  0.088111
2  0.188150  0.847099
3       NaN -0.053563
4  1.327250 -0.376076
5  3.724980  0.292757
6 -0.319342       NaN
7 -1.051529  0.389843
8 -0.805542 -0.018347
9 -0.816261 -1.627026

Answer 3

回答by stellasia

Here is a possible solution:

这是一个可能的解决方案：

s = dff.isnull().apply(sum, axis=0) # count the number of nan in each column
print s
   A    1 
   B    1
   C    3
   dtype: int64

for col in dff: 
   if s[col] >= 2:  
       del dff[col]

Or

或者

for c in dff:
    if sum(dff[c].isnull()) >= 2:
        dff.drop(c, axis=1, inplace=True)

Answer 4

回答by NAGARAJ

I recommend the drop-method. This is an alternative solution:

我推荐drop- 方法。这是一个替代解决方案：

dff.drop(dff.loc[:,len(dff) - dff.isnull().sum() <2], axis=1)

Answer 5

回答by Ashish

Say you have to drop columns having more than 70% null values.

假设您必须删除空值超过 70% 的列。

data.drop(data.loc[:,list((100*(data.isnull().sum()/len(data.index))>70))].columns, 1)

pandas 数据框按 nan 的数量删除列

提问by pyan

回答by EdChum

回答by Alexander

回答by stellasia

回答by NAGARAJ

回答by Ashish

相关推荐

最近更新

标签

pandas 数据框按 nan 的数量删除列

提问by pyan

回答by EdChum

回答by Alexander

回答by stellasia

回答by NAGARAJ

回答by Ashish

相关推荐

使用 scipy.io 将 python pandas 数据帧转换为 matlab 结构

Pandas 数据框前 ​​x 列

Python DatetimeIndex 错误 - TypeError: (“不能在 <class 'pandas.tseries.index.DatetimeIndex' 上做标签索引

Pandas - 删除多列中的重复项

相关推荐

最近更新

标签

Pandas 数据框前 x 列