用 Pandas 数据框中的列分位数替换异常值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41759993/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:49:36  来源:igfitidea点击:

Replace outliers with column quantile in Pandas dataframe

pythonpandasdataframequantile

提问by shda

I have a dataframe:

我有一个数据框:

df = pd.DataFrame(np.random.randint(0,100,size=(5, 2)), columns=list('AB'))
    A   B
0  92  65
1  61  97
2  17  39
3  70  47
4  56   6

Here are 5% quantiles:

以下是 5% 的分位数:

down_quantiles = df.quantile(0.05)
A    24.8
B    12.6

And here is the mask for values that are lower than quantiles:

这是低于分位数的值的掩码:

outliers_low = (df < down_quantiles)
       A      B
0  False  False
1  False  False
2   True  False
3  False  False
4  False   True

I want to set values in dflower than quantile to its column quantile. I can do it like this:

我想将df低于分位数的值设置为其列分位数。我可以这样做:

df[outliers_low] = np.nan
df.fillna(down_quantiles, inplace=True)

    A   B
0  92.0  65.0
1  61.0  97.0
2  24.8  39.0
3  70.0  47.0
4  56.0  12.6

But certainly there should be a more elegant way. How can I do this without fillna? Thanks.

但当然应该有更优雅的方式。没有我怎么能做到这一点fillna?谢谢。

回答by Nickil Maveli

You can use DF.mask()method. Wherever there is a presence of a Trueinstance, the values from the other series get's replaced aligned as per matching column names by providing axis=1.

您可以使用DF.mask()方法。只要存在True实例,来自其他系列的值就会通过提供axis=1.

df.mask(outliers_low, down_quantiles, axis=1)  

enter image description here

在此处输入图片说明



Another variant would be to use DF.where()method after inverting your boolean mask using the tilde (~) symbol.

另一种变体是在使用DF.where()波浪号 ( ~) 符号反转布尔掩码后使用方法。

df.where(~outliers_low, down_quantiles, axis=1)

enter image description here

在此处输入图片说明