pandas python中pandas中DataFrame的dropna中的thresh
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/51584906/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
thresh in dropna for DataFrame in pandas in python
提问by AAA
df1 = pd.DataFrame(np.arange(15).reshape(5,3))
df1.iloc[:4,1] = np.nan
df1.iloc[:2,2] = np.nan
df1.dropna(thresh=1 ,axis=1)
It seems that no nan value has been deleted.
似乎没有删除 nan 值。
0 1 2
0 0 NaN NaN
1 3 NaN NaN
2 6 NaN 8.0
3 9 NaN 11.0
4 12 13.0 14.0
if i run
如果我跑
df1.dropna(thresh=2,axis=1)
why it gives the following?
为什么它给出以下内容?
0 2
0 0 NaN
1 3 NaN
2 6 8.0
3 9 11.0
4 12 14.0
i just dont understand what thresh is doing here. If a column has more than one nan value, should the column be deleted?
我只是不明白 thresh 在这里做什么。如果一列有多个 nan 值,是否应该删除该列?
回答by DYZ
thresh=N
requires that a column has at least N
non-NaNs to survive. In the first example, both columns have at least one non-NaN, so both survive. In the second example, only the last column has at least two non-NaNs, so it survives, but the previous column is dropped.
thresh=N
要求一列至少有N
非 NaN 才能生存。在第一个示例中,两列都至少有一个非 NaN,因此都存在。在第二个例子中,只有最后一列至少有两个非 NaN,所以它仍然存在,但前一列被删除。
Try setting thresh
to 4 to get a better sense of what's happening.
尝试设置thresh
为 4 以更好地了解正在发生的事情。