Pandas - 删除只有 NaN 值的行

Question

提问by Slavatron

I have a DataFrame containing many NaN values. I want to delete rows that contain too many NaN values; specifically: 7 or more.

我有一个包含许多 NaN 值的 DataFrame。我想删除包含太多 NaN 值的行；特别是：7个或更多。

I tried using the dropnafunction several ways but it seems clear that it greedily deletes columns or rows that contain anyNaN values.

我尝试以多种方式使用dropna函数，但很明显它会贪婪地删除包含任何NaN 值的列或行。

This question (Slice Pandas DataFrame by Row), shows me that if I can just compile a list of the rows that have too many NaN values, I can delete them all with a simple

这个问题（Slice Pandas DataFrame by Row）告诉我，如果我可以编译一个包含太多 NaN 值的行的列表，我可以用一个简单的方法将它们全部删除

df.drop(rows)

I know I can count non-null values using the countfunction which I could them subtract from the total and get the NaN count that way (Is there a direct way to count NaN values in a row?). But even so, I am not sure how to write a loop that goes through a DataFrame row-by-row.

我知道我可以使用count函数计算非空值，我可以将它们从总数中减去并以这种方式获得 NaN 计数（是否有直接的方法来计算连续的 NaN 值？）。但即便如此，我还是不确定如何编写一个逐行遍历 DataFrame 的循环。

Here's some pseudo-code that I think is on the right track:

这是我认为正确的一些伪代码：

### LOOP FOR ADDRESSING EACH row:
    m = total - row.count()
    if (m > 7):
        df.drop(row)

I am still new to Pandas so I'm very open to other ways of solving this problem; whether they're simpler or more complex.

我还是 Pandas 的新手，所以我对解决这个问题的其他方法非常开放；无论它们是更简单还是更复杂。

Answer 1

采纳答案by EdChum

Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:

基本上这样做的方法是确定 cols 的数量，设置非 nan 值的最小数量并删除不符合此条件的行：

df.dropna(thresh=(len(df) - 7))

See the docs

查看文档

Answer 2

回答by Roger Fan

The optional thresh argument of df.dropnalets you give it the minimum number of non-NA values in order to keep the row.

df.dropna的可选 thresh 参数允许您为其提供最小数量的非 NA 值以保留该行。

df.dropna(thresh=df.shape[1]-7)

Pandas - 删除只有 NaN 值的行

提问by Slavatron

采纳答案by EdChum

回答by Roger Fan

相关推荐

最近更新

标签

Pandas - 删除只有 NaN 值的行

提问by Slavatron

采纳答案by EdChum

回答by Roger Fan

相关推荐

pandas 熊猫只从数据框中选择数字或整数字段

pandas 使用 XlsxWriter 将熊猫图表插入到 Excel 文件中

pandas 向 MultiIndex DataFrame/Series 添加一行

pandas 为熊猫数据帧中的整数格式化千位分隔符

相关推荐

最近更新

标签