Pandas - 删除只有 NaN 值的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25146277/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:20:06  来源:igfitidea点击:

Pandas - Delete Rows with only NaN values

pythonpandasrowsdataframe

提问by Slavatron

I have a DataFrame containing many NaN values. I want to delete rows that contain too many NaN values; specifically: 7 or more.

我有一个包含许多 NaN 值的 DataFrame。我想删除包含太多 NaN 值的行;特别是:7个或更多。

I tried using the dropnafunction several ways but it seems clear that it greedily deletes columns or rows that contain anyNaN values.

我尝试以多种方式使用dropna函数,但很明显它会贪婪地删除包含任何NaN 值的列或行。

This question (Slice Pandas DataFrame by Row), shows me that if I can just compile a list of the rows that have too many NaN values, I can delete them all with a simple

这个问题(Slice Pandas DataFrame by Row)告诉我,如果我可以编译一个包含太多 NaN 值的行的列表,我可以用一个简单的方法将它们全部删除

df.drop(rows)

I know I can count non-null values using the countfunction which I could them subtract from the total and get the NaN count that way (Is there a direct way to count NaN values in a row?). But even so, I am not sure how to write a loop that goes through a DataFrame row-by-row.

我知道我可以使用count函数计算非空值,我可以将它们从总数中减去并以这种方式获得 NaN 计数(是否有直接的方法来计算连续的 NaN 值?)。但即便如此,我还是不确定如何编写一个逐行遍历 DataFrame 的循环。

Here's some pseudo-code that I think is on the right track:

这是我认为正确的一些伪代码:

### LOOP FOR ADDRESSING EACH row:
    m = total - row.count()
    if (m > 7):
        df.drop(row)

I am still new to Pandas so I'm very open to other ways of solving this problem; whether they're simpler or more complex.

我还是 Pandas 的新手,所以我对解决这个问题的其他方法非常开放;无论它们是更简单还是更复杂。

采纳答案by EdChum

Basically the way to do this is determine the number of cols, set the minimum number of non-nan values and drop the rows that don't meet this criteria:

基本上这样做的方法是确定 cols 的数量,设置非 nan 值的最小数量并删除不符合此条件的行:

df.dropna(thresh=(len(df) - 7))

See the docs

查看文档

回答by Roger Fan

The optional thresh argument of df.dropnalets you give it the minimum number of non-NA values in order to keep the row.

df.dropna的可选 thresh 参数允许您为其提供最小数量的非 NA 值以保留该行。

df.dropna(thresh=df.shape[1]-7)