使用 groupby 填充 Pandas
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46391128/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas fillna using groupby
提问by Niche.P
I am trying to impute the value using row with similar columns' values.
我正在尝试使用具有相似列值的行来估算值。
For example, I have this dataframe
例如,我有这个数据框
one | two | three
1 1 10
1 1 nan
1 1 nan
1 2 nan
1 2 20
1 2 nan
1 3 nan
1 3 nan
I wanted to using the keys of column ['one'] and ['two'] which is similar and if column ['three'] is not entirely nan then impute the existing value from a row of similar keys with value in column ['3']
我想使用列 ['one'] 和 ['two'] 的键,它们是相似的,如果列 ['three'] 不完全是 nan,则从列中具有值的类似键的行中估算现有值 [ '3']
Here is my desire result
这是我的愿望结果
one | two | three
1 1 10
1 1 10
1 1 10
1 2 20
1 2 20
1 2 20
1 3 nan
1 3 nan
You can see that keys 1 and 3 do not contain any value because the existing value does not exists.
您可以看到键 1 和 3 不包含任何值,因为现有值不存在。
I have tried using groupby fillna()
我试过使用 groupby fillna()
df['three'] = df.groupby(['one','two'])['three'].fillna()
which gave me an error.
这给了我一个错误。
I have tried forward fill which give me rather strange result where it forward fill the column 2 instead. I am using this code for forward fill.
我尝试过向前填充,这给了我相当奇怪的结果,它向前填充第 2 列。我正在使用此代码进行前向填充。
df['three'] = df.groupby(['one','two'], sort=False)['three'].ffill()
Thank you for your time.
感谢您的时间。
回答by jezrael
If only one non NaN value per group use ffill(forward filling) and bfill(backward filling) per group, so need applywith lambda:
如果每组仅使用一个非 NaN 值ffill(向前填充)和bfill(向后填充),则需要apply使用lambda:
df['three'] = df.groupby(['one','two'], sort=False)['three']
.apply(lambda x: x.ffill().bfill())
print (df)
one two three
0 1 1 10.0
1 1 1 10.0
2 1 1 10.0
3 1 2 20.0
4 1 2 20.0
5 1 2 20.0
6 1 3 NaN
7 1 3 NaN
But if multiple value per group and need replace NaNby some constant - e.g. meanby group:
但是如果每组有多个值并且需要用NaN一些常量替换- 例如mean按组:
print (df)
one two three
0 1 1 10.0
1 1 1 40.0
2 1 1 NaN
3 1 2 NaN
4 1 2 20.0
5 1 2 NaN
6 1 3 NaN
7 1 3 NaN
df['three'] = df.groupby(['one','two'], sort=False)['three']
.apply(lambda x: x.fillna(x.mean()))
print (df)
one two three
0 1 1 10.0
1 1 1 40.0
2 1 1 25.0
3 1 2 20.0
4 1 2 20.0
5 1 2 20.0
6 1 3 NaN
7 1 3 NaN

