使用 groupby 填充 Pandas

Question

提问by Niche.P

I am trying to impute the value using row with similar columns' values.

我正在尝试使用具有相似列值的行来估算值。

For example, I have this dataframe

例如，我有这个数据框

one | two | three
1      1     10
1      1     nan
1      1     nan
1      2     nan
1      2     20
1      2     nan
1      3     nan
1      3     nan

I wanted to using the keys of column ['one'] and ['two'] which is similar and if column ['three'] is not entirely nan then impute the existing value from a row of similar keys with value in column ['3']

我想使用列 ['one'] 和 ['two'] 的键，它们是相似的，如果列 ['three'] 不完全是 nan，则从列中具有值的类似键的行中估算现有值 [ '3']

Here is my desire result

这是我的愿望结果

one | two | three
1      1     10
1      1     10
1      1     10
1      2     20
1      2     20
1      2     20
1      3     nan
1      3     nan

You can see that keys 1 and 3 do not contain any value because the existing value does not exists.

您可以看到键 1 和 3 不包含任何值，因为现有值不存在。

I have tried using groupby fillna()

我试过使用 groupby fillna()

df['three'] = df.groupby(['one','two'])['three'].fillna()

which gave me an error.

这给了我一个错误。

I have tried forward fill which give me rather strange result where it forward fill the column 2 instead. I am using this code for forward fill.

我尝试过向前填充，这给了我相当奇怪的结果，它向前填充第 2 列。我正在使用此代码进行前向填充。

df['three'] = df.groupby(['one','two'], sort=False)['three'].ffill()

Thank you for your time.

感谢您的时间。

Answer 1

回答by jezrael

If only one non NaN value per group use ffill(forward filling) and bfill(backward filling) per group, so need applywith lambda:

如果每组仅使用一个非 NaN 值ffill（向前填充）和bfill（向后填充），则需要apply使用lambda：

df['three'] = df.groupby(['one','two'], sort=False)['three']
                .apply(lambda x: x.ffill().bfill())
print (df)
   one  two  three
0    1    1   10.0
1    1    1   10.0
2    1    1   10.0
3    1    2   20.0
4    1    2   20.0
5    1    2   20.0
6    1    3    NaN
7    1    3    NaN

But if multiple value per group and need replace NaNby some constant - e.g. meanby group:

但是如果每组有多个值并且需要用NaN一些常量替换- 例如mean按组：

print (df)
   one  two  three
0    1    1   10.0
1    1    1   40.0
2    1    1    NaN
3    1    2    NaN
4    1    2   20.0
5    1    2    NaN
6    1    3    NaN
7    1    3    NaN

df['three'] = df.groupby(['one','two'], sort=False)['three']
                .apply(lambda x: x.fillna(x.mean()))
print (df)
   one  two  three
0    1    1   10.0
1    1    1   40.0
2    1    1   25.0
3    1    2   20.0
4    1    2   20.0
5    1    2   20.0
6    1    3    NaN
7    1    3    NaN

使用 groupby 填充 Pandas

提问by Niche.P

回答by jezrael

相关推荐

最近更新

标签

使用 groupby 填充 Pandas

提问by Niche.P

回答by jezrael

相关推荐

Pandas 数据框 - 删除异常值

pandas 类型错误：float() 参数必须是字符串或数字，而不是“函数”——Python/Sklearn

pandas 高效连接多个熊猫系列

pandas 尝试从日期列中提取年份时，“AttributeError: Can only use .dt accessor with datetimelike values”

相关推荐

最近更新

标签