pandas 如何通过pandas中的groupby输出填充?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41680089/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to fillna by groupby outputs in pandas?
提问by Abhisek Dash
I have a dataframe having 4 columns(A,B,C,D). D has some NaN entries. I want to fill the NaN values by the average value of D having same value of A,B,C.
我有一个包含 4 列(A、B、C、D)的数据框。D 有一些 NaN 条目。我想用具有相同 A、B、C 值的 D 的平均值填充 NaN 值。
For example,if the value of A,B,C,D are x,y,z and Nan respectively,then I want the NaN value to be replaced by the average of D for the rows where the value of A,B,C are x,y,z respectively.
例如,如果 A、B、C、D 的值分别为 x、y、z 和 Nan,那么我希望将 NaN 值替换为 A、B、C 值所在行的 D 的平均值分别为 x,y,z。
回答by Zero
df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))
would be faster than apply
df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))
会比 apply
In [2400]: df
Out[2400]:
A B C D
0 1 1 1 1.0
1 1 1 1 NaN
2 1 1 1 3.0
3 3 3 3 5.0
In [2401]: df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))
Out[2401]:
0 1.0
1 2.0
2 3.0
3 5.0
Name: D, dtype: float64
In [2402]: df['D'] = df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))
In [2403]: df
Out[2403]:
A B C D
0 1 1 1 1.0
1 1 1 1 2.0
2 1 1 1 3.0
3 3 3 3 5.0
Details
细节
In [2396]: df.shape
Out[2396]: (10000, 4)
In [2398]: %timeit df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))
100 loops, best of 3: 3.44 ms per loop
In [2397]: %timeit df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))
100 loops, best of 3: 5.34 ms per loop
回答by jezrael
I think you need:
我认为你需要:
df.D = df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))
Sample:
样本:
df = pd.DataFrame({'A':[1,1,1,3],
'B':[1,1,1,3],
'C':[1,1,1,3],
'D':[1,np.nan,3,5]})
print (df)
A B C D
0 1 1 1 1.0
1 1 1 1 NaN
2 1 1 1 3.0
3 3 3 3 5.0
df.D = df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))
print (df)
A B C D
0 1 1 1 1.0
1 1 1 1 2.0
2 1 1 1 3.0
3 3 3 3 5.0
回答by Fred Cascarini
Link to duplicate of this question for further information: Pandas Dataframe: Replacing NaN with row average
链接到此问题的副本以获取更多信息: Pandas Dataframe: Replacing NaN with row average
Another suggested way of doing it mentioned in the link is using a simple fillna on the transpose:
df.T.fillna(df.mean(axis=1)).T
链接中提到的另一种建议方法是在转置上使用简单的 fillna:
df.T.fillna(df.mean(axis=1)).T