pandas 如何通过pandas中的groupby输出填充?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41680089/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:47:37  来源:igfitidea点击:

How to fillna by groupby outputs in pandas?

pythonpandas

提问by Abhisek Dash

I have a dataframe having 4 columns(A,B,C,D). D has some NaN entries. I want to fill the NaN values by the average value of D having same value of A,B,C.

我有一个包含 4 列(A、B、C、D)的数据框。D 有一些 NaN 条目。我想用具有相同 A、B、C 值的 D 的平均值填充 NaN 值。

For example,if the value of A,B,C,D are x,y,z and Nan respectively,then I want the NaN value to be replaced by the average of D for the rows where the value of A,B,C are x,y,z respectively.

例如,如果 A、B、C、D 的值分别为 x、y、z 和 Nan,那么我希望将 NaN 值替换为 A、B、C 值所在行的 D 的平均值分别为 x,y,z。

回答by Zero

df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))would be faster than apply

df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))会比 apply

In [2400]: df
Out[2400]:
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  NaN
2  1  1  1  3.0
3  3  3  3  5.0

In [2401]: df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))
Out[2401]:
0    1.0
1    2.0
2    3.0
3    5.0
Name: D, dtype: float64

In [2402]: df['D'] = df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))

In [2403]: df
Out[2403]:
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  2.0
2  1  1  1  3.0
3  3  3  3  5.0


Details

细节

In [2396]: df.shape
Out[2396]: (10000, 4)

In [2398]: %timeit df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))
100 loops, best of 3: 3.44 ms per loop


In [2397]: %timeit df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))
100 loops, best of 3: 5.34 ms per loop

回答by jezrael

I think you need:

我认为你需要:

df.D = df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))

Sample:

样本:

df = pd.DataFrame({'A':[1,1,1,3],
                   'B':[1,1,1,3],
                   'C':[1,1,1,3],
                   'D':[1,np.nan,3,5]})

print (df)
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  NaN
2  1  1  1  3.0
3  3  3  3  5.0

df.D = df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))
print (df)
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  2.0
2  1  1  1  3.0
3  3  3  3  5.0

回答by Fred Cascarini

Link to duplicate of this question for further information: Pandas Dataframe: Replacing NaN with row average

链接到此问题的副本以获取更多信息: Pandas Dataframe: Replacing NaN with row average

Another suggested way of doing it mentioned in the link is using a simple fillna on the transpose: df.T.fillna(df.mean(axis=1)).T

链接中提到的另一种建议方法是在转置上使用简单的 fillna: df.T.fillna(df.mean(axis=1)).T