Python-pandas 用数据帧中组的中值或平均值替换 NA

Question

提问by Robin1988

Suppose we have a df:

假设我们有一个 df：

    A       B
   apple   1.0
   apple   2.0
   apple    NA
   orange   NA
   orange  7.0
   melon   14.0
   melon   NA
   melon   15.0
   melon   16.0

to replace the NA, we can use df["B"].fillna(df["B"].median()), but it will fill NA with the median of all data in "B"

要替换 NA，我们可以使用 df["B"].fillna(df["B"].median())，但它会用“B”中所有数据的中位数填充 NA

Is there any way that we can use the median of a certain A to replace the NA (like below):

有没有什么办法可以用某个A的中位数来代替NA（如下所示）：

    A       B
   apple   1.0
   apple   2.0
   apple   **1.5**
   orange  **7.0**
   orange  7.0
   melon   14.0
   melon   **15.0**
   melon   15.0
   melon   16.0

Thanks!

谢谢！

Answer 1

回答by behzad.nouri

In pandas you may use transformto obtain null-fill values:

在 Pandas 中，您可以transform用来获取空填充值：

>>> med = df.groupby('A')['B'].transform('median')
>>> df['B'].fillna(med)
0     1.0
1     2.0
2     1.5
3     7.0
4     7.0
5    14.0
6    15.0
7    15.0
8    16.0
Name: B, dtype: float64

Answer 2

回答by akrun

In R, can use na.aggregate/data.tableto replace the NAby meanvalue of the group. We convert the 'data.frame' to 'data.table' (setDT(df)), grouped by 'A', apply the na.aggregateon 'B'.

在中R，可用于na.aggregate/data.table替换组的NAbymean值。我们将'data.frame' 转换为'data.table' ( setDT(df))，按'A' 分组，na.aggregate在'B' 上应用。

library(zoo)
library(data.table)
setDT(df)[,  B:= na.aggregate(B), A]
df
#      A    B
#1:  apple  1.0
#2:  apple  2.0
#3:  apple  1.5
#4: orange  7.0
#5: orange  7.0
#6:  melon 14.0
#7:  melon 15.0
#8:  melon 15.0
#9:  melon 16.0

Python-pandas 用数据帧中组的中值或平均值替换 NA

提问by Robin1988

回答by behzad.nouri

回答by akrun

相关推荐

最近更新

标签

Python-pandas 用数据帧中组的中值或平均值替换 NA

提问by Robin1988

回答by behzad.nouri

回答by akrun

相关推荐

pandas 熊猫中所有 NaN 的总和返回零？

使用条件将 HDF5 文件读取到 Pandas DataFrame

Pandas 数据框应用参考前一行来计算差异

来自 Pandas 数据框的 seaborn 时间序列

相关推荐

最近更新

标签