Python-pandas 用数据帧中组的中值或平均值替换 NA
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33573408/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python-pandas Replace NA with the median or mean of a group in dataframe
提问by Robin1988
Suppose we have a df:
假设我们有一个 df:
A B
apple 1.0
apple 2.0
apple NA
orange NA
orange 7.0
melon 14.0
melon NA
melon 15.0
melon 16.0
to replace the NA, we can use df["B"].fillna(df["B"].median()), but it will fill NA with the median of all data in "B"
要替换 NA,我们可以使用 df["B"].fillna(df["B"].median()),但它会用“B”中所有数据的中位数填充 NA
Is there any way that we can use the median of a certain A to replace the NA (like below):
有没有什么办法可以用某个A的中位数来代替NA(如下所示):
A B
apple 1.0
apple 2.0
apple **1.5**
orange **7.0**
orange 7.0
melon 14.0
melon **15.0**
melon 15.0
melon 16.0
Thanks!
谢谢!
回答by behzad.nouri
回答by akrun
In R
, can use na.aggregate/data.table
to replace the NA
by mean
value of the group. We convert the 'data.frame' to 'data.table' (setDT(df)
), grouped by 'A', apply the na.aggregate
on 'B'.
在 中R
,可用于na.aggregate/data.table
替换组的NA
bymean
值。我们将'data.frame' 转换为'data.table' ( setDT(df)
),按'A' 分组,na.aggregate
在'B' 上应用。
library(zoo)
library(data.table)
setDT(df)[, B:= na.aggregate(B), A]
df
# A B
#1: apple 1.0
#2: apple 2.0
#3: apple 1.5
#4: orange 7.0
#5: orange 7.0
#6: melon 14.0
#7: melon 15.0
#8: melon 15.0
#9: melon 16.0