Pandas：如何用 groupby 的平均值填充空值？

Question

提问by sfactor

I have a dataset will some missing data that looks like this:

我有一个数据集会丢失一些看起来像这样的数据：

id    category     value
1     A            NaN
2     B            NaN
3     A            10.5
4     C            NaN
5     A            2.0
6     B            1.0

I need to fill in the nulls to use the data in a model. Every time a category occurs for the first time it is NULL. The way I want to do is for cases like category Aand Bthat have more than one value replace the nulls with the average of that category. And for category Cwith only single occurrence just fill in the average of the rest of the data.

我需要填写空值才能在模型中使用数据。每次第一次出现类别时，它都是 NULL。我想要做的方法是针对诸如类别之类的情况A，B并且具有多个值的情况用该类别的平均值替换空值。对于C仅出现一次的类别，只需填写其余数据的平均值。

I know that I can simply do this for cases like Cto get the average of all the rows but I'm stuck trying to do the categorywise means for A and B and replacing the nulls.

我知道我可以简单地C在获取所有行的平均值的情况下执行此操作，但我一直在尝试对 A 和 B 执行类别均值并替换空值。

df['value'] = df['value'].fillna(df['value'].mean())

I need the final df to be like this

我需要最终的 df 是这样的

id    category     value
1     A            6.25
2     B            1.0
3     A            10.5
4     C            4.15
5     A            2.0
6     B            1.0

Answer 1

采纳答案by jezrael

I think you can use groupbyand applyfillnawith mean. Then get NaNif some category has only NaNvalues, so use meanof all values of column for filling NaN:

我想你可以使用groupby和applyfillna使用mean。然后获取NaN是否某个类别只有NaN值，因此使用mean列的所有值进行填充NaN：

df.value = df.groupby('category')['value'].apply(lambda x: x.fillna(x.mean()))
df.value = df.value.fillna(df.value.mean())
print (df)
   id category  value
0   1        A   6.25
1   2        B   1.00
2   3        A  10.50
3   4        C   4.15
4   5        A   2.00
5   6        B   1.00

Answer 2

回答by jpp

You can also use GroupBy+ transformto fill NaNvalues with groupwise means. This method avoids inefficient apply+ lambda. For example:

您还可以使用GroupBy+transform以NaN分组方式填充值。这种方法避免了效率低下的apply+ lambda。例如：

df['value'] = df['value'].fillna(df.groupby('category')['value'].transform('mean'))
df['value'] = df['value'].fillna(df['value'].mean())

Pandas：如何用 groupby 的平均值填充空值？

提问by sfactor

采纳答案by jezrael

回答by jpp

相关推荐

最近更新

标签

Pandas：如何用 groupby 的平均值填充空值？

提问by sfactor

采纳答案by jezrael

回答by jpp

相关推荐

pandas 比较不同长度的熊猫数据帧

pandas “DataFrame”对象不可调用

pandas 用值交换索引的最快方法

带有索引的 Pandas Plot 导致“KeyError [] 不在索引中”

相关推荐

最近更新

标签