使用 groupby 填充 Pandas
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/46391128/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas fillna using groupby
提问by Niche.P
I am trying to impute the value using row with similar columns' values.
我正在尝试使用具有相似列值的行来估算值。
For example, I have this dataframe
例如,我有这个数据框
one | two | three
1 1 10
1 1 nan
1 1 nan
1 2 nan
1 2 20
1 2 nan
1 3 nan
1 3 nan
I wanted to using the keys of column ['one'] and ['two'] which is similar and if column ['three'] is not entirely nan then impute the existing value from a row of similar keys with value in column ['3']
我想使用列 ['one'] 和 ['two'] 的键,它们是相似的,如果列 ['three'] 不完全是 nan,则从列中具有值的类似键的行中估算现有值 [ '3']
Here is my desire result
这是我的愿望结果
one | two | three
1 1 10
1 1 10
1 1 10
1 2 20
1 2 20
1 2 20
1 3 nan
1 3 nan
You can see that keys 1 and 3 do not contain any value because the existing value does not exists.
您可以看到键 1 和 3 不包含任何值,因为现有值不存在。
I have tried using groupby fillna()
我试过使用 groupby fillna()
df['three'] = df.groupby(['one','two'])['three'].fillna()
which gave me an error.
这给了我一个错误。
I have tried forward fill which give me rather strange result where it forward fill the column 2 instead. I am using this code for forward fill.
我尝试过向前填充,这给了我相当奇怪的结果,它向前填充第 2 列。我正在使用此代码进行前向填充。
df['three'] = df.groupby(['one','two'], sort=False)['three'].ffill()
Thank you for your time.
感谢您的时间。
回答by jezrael
If only one non NaN value per group use ffill
(forward filling) and bfill
(backward filling) per group, so need apply
with lambda
:
如果每组仅使用一个非 NaN 值ffill
(向前填充)和bfill
(向后填充),则需要apply
使用lambda
:
df['three'] = df.groupby(['one','two'], sort=False)['three']
.apply(lambda x: x.ffill().bfill())
print (df)
one two three
0 1 1 10.0
1 1 1 10.0
2 1 1 10.0
3 1 2 20.0
4 1 2 20.0
5 1 2 20.0
6 1 3 NaN
7 1 3 NaN
But if multiple value per group and need replace NaN
by some constant - e.g. mean
by group:
但是如果每组有多个值并且需要用NaN
一些常量替换- 例如mean
按组:
print (df)
one two three
0 1 1 10.0
1 1 1 40.0
2 1 1 NaN
3 1 2 NaN
4 1 2 20.0
5 1 2 NaN
6 1 3 NaN
7 1 3 NaN
df['three'] = df.groupby(['one','two'], sort=False)['three']
.apply(lambda x: x.fillna(x.mean()))
print (df)
one two three
0 1 1 10.0
1 1 1 40.0
2 1 1 25.0
3 1 2 20.0
4 1 2 20.0
5 1 2 20.0
6 1 3 NaN
7 1 3 NaN