pandas 如何通过pandas中的groupby输出填充？

Question

提问by Abhisek Dash

I have a dataframe having 4 columns(A,B,C,D). D has some NaN entries. I want to fill the NaN values by the average value of D having same value of A,B,C.

我有一个包含 4 列（A、B、C、D）的数据框。D 有一些 NaN 条目。我想用具有相同 A、B、C 值的 D 的平均值填充 NaN 值。

For example,if the value of A,B,C,D are x,y,z and Nan respectively,then I want the NaN value to be replaced by the average of D for the rows where the value of A,B,C are x,y,z respectively.

例如，如果 A、B、C、D 的值分别为 x、y、z 和 Nan，那么我希望将 NaN 值替换为 A、B、C 值所在行的 D 的平均值分别为 x,y,z。

Answer 1

回答by Zero

df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))would be faster than apply

df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))会比 apply

In [2400]: df
Out[2400]:
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  NaN
2  1  1  1  3.0
3  3  3  3  5.0

In [2401]: df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))
Out[2401]:
0    1.0
1    2.0
2    3.0
3    5.0
Name: D, dtype: float64

In [2402]: df['D'] = df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))

In [2403]: df
Out[2403]:
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  2.0
2  1  1  1  3.0
3  3  3  3  5.0

Details

细节

In [2396]: df.shape
Out[2396]: (10000, 4)

In [2398]: %timeit df['D'].fillna(df.groupby(['A','B','C'])['D'].transform('mean'))
100 loops, best of 3: 3.44 ms per loop


In [2397]: %timeit df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))
100 loops, best of 3: 5.34 ms per loop

Answer 2

回答by jezrael

I think you need:

我认为你需要：

df.D = df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))

Sample:

样本：

df = pd.DataFrame({'A':[1,1,1,3],
                   'B':[1,1,1,3],
                   'C':[1,1,1,3],
                   'D':[1,np.nan,3,5]})

print (df)
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  NaN
2  1  1  1  3.0
3  3  3  3  5.0

df.D = df.groupby(['A','B','C'])['D'].apply(lambda x: x.fillna(x.mean()))
print (df)
   A  B  C    D
0  1  1  1  1.0
1  1  1  1  2.0
2  1  1  1  3.0
3  3  3  3  5.0

Answer 3

回答by Fred Cascarini

Link to duplicate of this question for further information: Pandas Dataframe: Replacing NaN with row average

链接到此问题的副本以获取更多信息： Pandas Dataframe: Replacing NaN with row average

Another suggested way of doing it mentioned in the link is using a simple fillna on the transpose: df.T.fillna(df.mean(axis=1)).T

链接中提到的另一种建议方法是在转置上使用简单的 fillna： df.T.fillna(df.mean(axis=1)).T

pandas 如何通过pandas中的groupby输出填充？

提问by Abhisek Dash

回答by Zero

回答by jezrael

回答by Fred Cascarini

相关推荐

最近更新

标签

pandas 如何通过pandas中的groupby输出填充？

提问by Abhisek Dash

回答by Zero

回答by jezrael

回答by Fred Cascarini

相关推荐

Python Pandas 线性回归 groupby

Python & Pandas：如何查询列表类型的列是否包含某些内容？

pandas 熊猫分组和过滤

pandas 熊猫将时间列添加到日期索引

相关推荐

最近更新

标签