pandas Python：用中值替换异常值

Question

提问by user4943236

I have a python data-frame in which there are some outlier values. I would like to replace them with the median values of the data, had those values not been there.

我有一个 python 数据框，其中有一些异常值。如果这些值不存在，我想用数据的中值替换它们。

id         Age
10236    766105
11993       288
9337        205
38189        88
35555        82
39443        75
10762        74
33847        72
21194        70
39450        70

So, I want to replace all the values > 75 with the median value of the dataset of the remaining dataset, i.e., the median value of 70,70,72,74,75.

所以，我想用剩余数据集的数据集的中值替换所有> 75的值，即的中值70,70,72,74,75。

I'm trying to do the following:

我正在尝试执行以下操作：

Replace with 0, all the values that are greater than 75
Replace the 0s with median value.

替换为 0，所有大于 75 的值
用中值替换 0。

But somehow, the below code not working

但不知何故，下面的代码不起作用

df['age'].replace(df.age>75,0,inplace=True)

Answer 1

回答by Bharath

I think this is what you are looking for, you can use loc to assign value . Then you can fill the nan

我认为这就是您要寻找的，您可以使用 loc 来分配 value 。然后就可以填nan

median = df.loc[df['Age']<75, 'Age'].median()
df.loc[df.Age > 75, 'Age'] = np.nan
df.fillna(median,inplace=True)

You can also use np.where in one line

您也可以在一行中使用 np.where

df["Age"] = np.where(df["Age"] >75, median,df['Age'])

You can also use .mask i.e

你也可以使用 .mask 即

df["Age"] = df["Age"].mask(df["Age"] >75, median)

Answer 2

回答by behnamoh

A more general solution I've tried lately: replace 75 with the median of the whole column and then follow a solution similar to what Bharath suggested:

我最近尝试了一个更通用的解决方案：用整列的中位数替换 75，然后遵循类似于 Bharath 建议的解决方案：

median = float(df['Age'].median())
df["Age"] = np.where(df["Age"] > median, median, df['Age'])

pandas Python：用中值替换异常值

提问by user4943236

回答by Bharath

回答by behnamoh

相关推荐

最近更新

标签

pandas Python：用中值替换异常值

提问by user4943236

回答by Bharath

回答by behnamoh

相关推荐

pandas 值错误：序数必须 >= 1

Python / Pandas - KeyError 合并数据帧

pandas 基于值的条形图的 Matplotlib 不同颜色

在 Pandas 数据框中检查 None

相关推荐

最近更新

标签