用 Pandas 数据框中的列分位数替换异常值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41759993/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Replace outliers with column quantile in Pandas dataframe
提问by shda
I have a dataframe:
我有一个数据框:
df = pd.DataFrame(np.random.randint(0,100,size=(5, 2)), columns=list('AB'))
A B
0 92 65
1 61 97
2 17 39
3 70 47
4 56 6
Here are 5% quantiles:
以下是 5% 的分位数:
down_quantiles = df.quantile(0.05)
A 24.8
B 12.6
And here is the mask for values that are lower than quantiles:
这是低于分位数的值的掩码:
outliers_low = (df < down_quantiles)
A B
0 False False
1 False False
2 True False
3 False False
4 False True
I want to set values in df
lower than quantile to its column quantile. I can do it like this:
我想将df
低于分位数的值设置为其列分位数。我可以这样做:
df[outliers_low] = np.nan
df.fillna(down_quantiles, inplace=True)
A B
0 92.0 65.0
1 61.0 97.0
2 24.8 39.0
3 70.0 47.0
4 56.0 12.6
But certainly there should be a more elegant way. How can I do this without fillna
?
Thanks.
但当然应该有更优雅的方式。没有我怎么能做到这一点fillna
?谢谢。
回答by Nickil Maveli
You can use DF.mask()
method. Wherever there is a presence of a True
instance, the values from the other series get's replaced aligned as per matching column names by providing axis=1
.
您可以使用DF.mask()
方法。只要存在True
实例,来自其他系列的值就会通过提供axis=1
.
df.mask(outliers_low, down_quantiles, axis=1)
Another variant would be to use DF.where()
method after inverting your boolean mask using the tilde (~
) symbol.
另一种变体是在使用DF.where()
波浪号 ( ~
) 符号反转布尔掩码后使用方法。
df.where(~outliers_low, down_quantiles, axis=1)