pandas 用 groupby 方法替换值

Question

提问by Def_Os

I have a DataFrame with a column that has some bad data with various negative values. I would like to replace values < 0 with the mean of the group that they are in.

我有一个 DataFrame，其中有一列包含一些带有各种负值的坏数据。我想用它们所在的组的平均值替换 <0 的值。

For missing values as NAs, I would do:

对于作为 NA 的缺失值，我会这样做：

data = df.groupby(['GroupID']).column
data.transform(lambda x: x.fillna(x.mean()))

But how to do this operation on a condition like x < 0?

但是如何在类似的条件下进行此操作x < 0？

Thanks!

谢谢！

Answer 1

采纳答案by unutbu

Using @AndyHayden's example, you could use groupby/transformwith replace:

使用@AndyHayden 的示例，您可以将groupby/transform与replace：

df = pd.DataFrame([[1,1],[1,-1],[2,1],[2,2]], columns=list('ab'))
print(df)
#    a  b
# 0  1  1
# 1  1 -1
# 2  2  1
# 3  2  2

data = df.groupby(['a'])
def replace(group):
    mask = group<0
    # Select those values where it is < 0, and replace
    # them with the mean of the values which are not < 0.
    group[mask] = group[~mask].mean()
    return group
print(data.transform(replace))
#    b
# 0  1
# 1  1
# 2  1
# 3  2

Answer 2

回答by Andy Hayden

Here's one way to do it (for the 'b'column, in this boring example):

这是一种方法（对于'b'列，在这个无聊的例子中）：

In [1]: df = pd.DataFrame([[1,1],[1,-1],[2,1],[2,2]], columns=list('ab'))
In [2]: df
Out[2]: 
   a  b
0  1  1
1  1 -1
2  2  1
3  2  2

Replace those negative values with NaN, and then calculate the mean (b) in each group:

用 NaN 替换那些负值，然后计算b每组中的平均值 ( )：

In [3]: df['b'] = df.b.apply(lambda x: x if x>=0 else pd.np.nan)
In [4]: m = df.groupby('a').mean().b

Then use applyacross each row, to replace each NaN with its groups mean:

然后apply在每一行中使用，用它的组替换每个 NaN 意味着：

In [5]: df['b'] = df.apply(lambda row: m[row['a']]
                                       if pd.isnull(row['b'])
                                       else row['b'],
                           axis=1) 
In [6]: df
Out[6]: 
   a  b
0  1  1
1  1  1
2  2  1
3  2  2

Answer 3

回答by YOBEN_S

There is a great Example, for your additional question.

对于您的其他问题，有一个很好的示例。

df = pd.DataFrame({'A' : [1, 1, 2, 2], 'B' : [1, -1, 1, 2]})
gb = df.groupby('A')
def replace(g):
   mask = g < 0
   g.loc[mask] = g[~mask].mean()
   return g
gb.transform(replace)

Link: http://pandas.pydata.org/pandas-docs/stable/cookbook.html

链接：http: //pandas.pydata.org/pandas-docs/stable/cookbook.html

Answer 4

回答by solub

I had the same issue and came up with a rather simple solution

我遇到了同样的问题，并提出了一个相当简单的解决方案

func = lambda x : np.where(x < 0, x.mean(), x)

df['Bad_Column'].transform(func)

Note that if you want to return the mean of the correct values (mean based on positive values only) you'd have to specify:

请注意，如果您想返回正确值的平均值（仅基于正值的平均值），您必须指定：

func = lambda x : np.where(x < 0, x.mask(x < 0).mean(), x)

pandas 用 groupby 方法替换值

提问by Def_Os

采纳答案by unutbu

回答by Andy Hayden

回答by YOBEN_S

回答by solub

相关推荐

最近更新

标签

pandas 用 groupby 方法替换值

提问by Def_Os

采纳答案by unutbu

回答by Andy Hayden

回答by YOBEN_S

回答by solub

相关推荐

pandas 基于值而不是计数的带窗口的熊猫滚动计算

将 Pandas group by object 转换为多索引 Dataframe

在 Pandas 数据框中查找连续段

pandas 带有熊猫的 OLS：日期时间索引作为预测器

相关推荐

最近更新

标签