使用 groupby 填充 Pandas

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46391128/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:31:35  来源:igfitidea点击:

Pandas fillna using groupby

pythonpandas

提问by Niche.P

I am trying to impute the value using row with similar columns' values.

我正在尝试使用具有相似列值的行来估算值。

For example, I have this dataframe

例如,我有这个数据框

one | two | three
1      1     10
1      1     nan
1      1     nan
1      2     nan
1      2     20
1      2     nan
1      3     nan
1      3     nan

I wanted to using the keys of column ['one'] and ['two'] which is similar and if column ['three'] is not entirely nan then impute the existing value from a row of similar keys with value in column ['3']

我想使用列 ['one'] 和 ['two'] 的键,它们是相似的,如果列 ['three'] 不完全是 nan,则从列中具有值的类似键的行中估算现有值 [ '3']

Here is my desire result

这是我的愿望结果

one | two | three
1      1     10
1      1     10
1      1     10
1      2     20
1      2     20
1      2     20
1      3     nan
1      3     nan

You can see that keys 1 and 3 do not contain any value because the existing value does not exists.

您可以看到键 1 和 3 不包含任何值,因为现有值不存在。

I have tried using groupby fillna()

我试过使用 groupby fillna()

df['three'] = df.groupby(['one','two'])['three'].fillna()

which gave me an error.

这给了我一个错误。

I have tried forward fill which give me rather strange result where it forward fill the column 2 instead. I am using this code for forward fill.

我尝试过向前填充,这给了我相当奇怪的结果,它向前填充第 2 列。我正在使用此代码进行前向填充。

df['three'] = df.groupby(['one','two'], sort=False)['three'].ffill()

Thank you for your time.

感谢您的时间。

回答by jezrael

If only one non NaN value per group use ffill(forward filling) and bfill(backward filling) per group, so need applywith lambda:

如果每组仅使用一个非 NaN 值ffill(向前填充)和bfill(向后填充),则需要apply使用lambda

df['three'] = df.groupby(['one','two'], sort=False)['three']
                .apply(lambda x: x.ffill().bfill())
print (df)
   one  two  three
0    1    1   10.0
1    1    1   10.0
2    1    1   10.0
3    1    2   20.0
4    1    2   20.0
5    1    2   20.0
6    1    3    NaN
7    1    3    NaN

But if multiple value per group and need replace NaNby some constant - e.g. meanby group:

但是如果每组有多个值并且需要用NaN一些常量替换- 例如mean按组:

print (df)
   one  two  three
0    1    1   10.0
1    1    1   40.0
2    1    1    NaN
3    1    2    NaN
4    1    2   20.0
5    1    2    NaN
6    1    3    NaN
7    1    3    NaN

df['three'] = df.groupby(['one','two'], sort=False)['three']
                .apply(lambda x: x.fillna(x.mean()))
print (df)
   one  two  three
0    1    1   10.0
1    1    1   40.0
2    1    1   25.0
3    1    2   20.0
4    1    2   20.0
5    1    2   20.0
6    1    3    NaN
7    1    3    NaN