pandas python中pandas中DataFrame的dropna中的thresh

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51584906/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:51:25  来源:igfitidea点击:

thresh in dropna for DataFrame in pandas in python

pythonpandas

提问by AAA

df1 = pd.DataFrame(np.arange(15).reshape(5,3))
df1.iloc[:4,1] = np.nan
df1.iloc[:2,2] = np.nan
df1.dropna(thresh=1 ,axis=1)

It seems that no nan value has been deleted.

似乎没有删除 nan 值。

    0     1     2
0   0   NaN   NaN
1   3   NaN   NaN
2   6   NaN   8.0
3   9   NaN  11.0
4  12  13.0  14.0

if i run

如果我跑

df1.dropna(thresh=2,axis=1)

why it gives the following?

为什么它给出以下内容?

    0     2
0   0   NaN
1   3   NaN
2   6   8.0
3   9  11.0
4  12  14.0

i just dont understand what thresh is doing here. If a column has more than one nan value, should the column be deleted?

我只是不明白 thresh 在这里做什么。如果一列有多个 nan 值,是否应该删除该列?

回答by DYZ

thresh=Nrequires that a column has at least Nnon-NaNs to survive. In the first example, both columns have at least one non-NaN, so both survive. In the second example, only the last column has at least two non-NaNs, so it survives, but the previous column is dropped.

thresh=N要求一列至少有N非 NaN 才能生存。在第一个示例中,两列都至少有一个非 NaN,因此都存在。在第二个例子中,只有最后一列至少有两个非 NaN,所以它仍然存在,但前一列被删除。

Try setting threshto 4 to get a better sense of what's happening.

尝试设置thresh为 4 以更好地了解正在发生的事情。