Python pandas 相当于 R groupby mutate

Question

提问by asosnovsky

So in R when I have a data frame consisting of say 4 columns, call it dfand I want to compute the ratio by sum product of a group, I can it in such a way:

因此，在 R 中，当我有一个由 4 列组成的数据框时，调用它df并且我想通过一组的总和来计算比率，我可以这样：

// generate data
df = data.frame(a=c(1,1,0,1,0),b=c(1,0,0,1,0),c=c(10,5,1,5,10),d=c(3,1,2,1,2));
| a   b   c    d |
| 1   1   10   3 |
| 1   0   5    1 |
| 0   0   1    2 |
| 1   1   5    1 |
| 0   0   10   2 |
// compute sum product ratio
df = df%>% group_by(a,b) %>%
      mutate(
          ratio=c/sum(c*d)
      );
| a   b   c    d  ratio |
| 1   1   10   3  0.286 |
| 1   1   5    1  0.143 |
| 1   0   5    1  1     |
| 0   0   1    2  0.045 |
| 0   0   10   2  0.454 |

But in python I need to resort to loops. I know there should be a more elegant way than raw loops in python, anyone got any ideas?

但在 python 中，我需要求助于循环。我知道应该有比 python 中的原始循环更优雅的方式，有人有任何想法吗？

Answer 1

回答by Psidom

It can be done with similar syntax with groupby()and apply():

它可以用类似的语法与完成groupby()和apply()：

df['ratio'] = df.groupby(['a','b'], group_keys=False).apply(lambda g: g.c/(g.c * g.d).sum())

Answer 2

回答by datistics

According to this thread on pandas githubwe can use the transform()method to replicate the combination of dplyr::groupby()and dplyr::mutate(). For this example, it would look as follows:

根据这一线索对大PandasGitHub上我们可以使用 transform()的方法来复制的组合dplyr::groupby()和dplyr::mutate()。对于此示例，它将如下所示：

df = pd.DataFrame(
    dict(
        a=(1 , 1, 0, 1, 0 ), 
        b=(1 , 0, 0, 1, 0 ),
        c=(10, 5, 1, 5, 10),
        d=(3 , 1, 2, 1, 2 ),
    )
).assign(
    prod_c_d = lambda x: x['c'] * x['d'], 
    ratio    = lambda x: x['c'] / (x.groupby(['a','b']).transform('sum')['prod_c_d'])
)

This example uses pandas method chaining. For more information on how to use method chaining to replicate dplyrworkflows see this blogpost.

此示例使用Pandas 方法链接。有关如何使用方法链复制dplyr工作流的更多信息，请参阅此博文。

The method using apply()and groupby()does not work for me because it does not seem to be adaptable. For example, it does not work if we delete g.c/from the lambda expression.

使用apply()和的方法groupby()对我不起作用，因为它似乎不具有适应性。例如，如果我们g.c/从 lambda 表达式中删除它就不起作用。

df['ratio'] = df.groupby(['a','b'], group_keys=False)\
    .apply(lambda g: (g.c * g.d).sum() )

Python pandas 相当于 R groupby mutate

提问by asosnovsky

回答by Psidom

回答by datistics

相关推荐

最近更新

标签

Python pandas 相当于 R groupby mutate

提问by asosnovsky

回答by Psidom

回答by datistics

相关推荐

pandas 如何使用熊猫按组计算时间差？

Pandas：累积回报函数

pandas 熊猫滚动给出 NaN

pandas datareader 引发 AttributeError：模块“pandas.io”没有属性“data”

相关推荐

最近更新

标签