Pandas:如何用 groupby 的平均值填充空值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40299055/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas: How to fill null values with mean of a groupby?
提问by sfactor
I have a dataset will some missing data that looks like this:
我有一个数据集会丢失一些看起来像这样的数据:
id category value
1 A NaN
2 B NaN
3 A 10.5
4 C NaN
5 A 2.0
6 B 1.0
I need to fill in the nulls to use the data in a model. Every time a category occurs for the first time it is NULL. The way I want to do is for cases like category A
and B
that have more than one value replace the nulls with the average of that category. And for category C
with only single occurrence just fill in the average of the rest of the data.
我需要填写空值才能在模型中使用数据。每次第一次出现类别时,它都是 NULL。我想要做的方法是针对诸如类别之类的情况A
,B
并且具有多个值的情况用该类别的平均值替换空值。对于C
仅出现一次的类别,只需填写其余数据的平均值。
I know that I can simply do this for cases like C
to get the average of all the rows but I'm stuck trying to do the categorywise means for A and B and replacing the nulls.
我知道我可以简单地C
在获取所有行的平均值的情况下执行此操作,但我一直在尝试对 A 和 B 执行类别均值并替换空值。
df['value'] = df['value'].fillna(df['value'].mean())
I need the final df to be like this
我需要最终的 df 是这样的
id category value
1 A 6.25
2 B 1.0
3 A 10.5
4 C 4.15
5 A 2.0
6 B 1.0
采纳答案by jezrael
I think you can use groupby
and apply
fillna
with mean
. Then get NaN
if some category has only NaN
values, so use mean
of all values of column for filling NaN
:
我想你可以使用groupby
和apply
fillna
使用mean
。然后获取NaN
是否某个类别只有NaN
值,因此使用mean
列的所有值进行填充NaN
:
df.value = df.groupby('category')['value'].apply(lambda x: x.fillna(x.mean()))
df.value = df.value.fillna(df.value.mean())
print (df)
id category value
0 1 A 6.25
1 2 B 1.00
2 3 A 10.50
3 4 C 4.15
4 5 A 2.00
5 6 B 1.00
回答by jpp
You can also use GroupBy
+ transform
to fill NaN
values with groupwise means. This method avoids inefficient apply
+ lambda
. For example:
您还可以使用GroupBy
+transform
以NaN
分组方式填充值。这种方法避免了效率低下的apply
+ lambda
。例如:
df['value'] = df['value'].fillna(df.groupby('category')['value'].transform('mean'))
df['value'] = df['value'].fillna(df['value'].mean())