pandas 熊猫:GroupBy .pipe() 与 .apply()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/47226407/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:45:19  来源:igfitidea点击:

pandas: GroupBy .pipe() vs .apply()

pythonpython-3.xpandaspandas-groupby

提问by foglerit

In the example from the pandas documentationabout the new .pipe()method for GroupBy objects, an .apply()method accepting the same lambda would return the same results.

Pandas 文档中关于.pipe()GroupBy 对象的新方法的示例中,.apply()接受相同 lambda的方法将返回相同的结果。

In [195]: import numpy as np

In [196]: n = 1000

In [197]: df = pd.DataFrame({'Store': np.random.choice(['Store_1', 'Store_2'], n),
   .....:                    'Product': np.random.choice(['Product_1', 'Product_2', 'Product_3'], n),
   .....:                    'Revenue': (np.random.random(n)*50+10).round(2),
   .....:                    'Quantity': np.random.randint(1, 10, size=n)})

In [199]: (df.groupby(['Store', 'Product'])
   .....:    .pipe(lambda grp: grp.Revenue.sum()/grp.Quantity.sum())
   .....:    .unstack().round(2))

Out[199]: 
Product  Product_1  Product_2  Product_3
Store                                   
Store_1       6.93       6.82       7.15
Store_2       6.69       6.64       6.77

I can see how the pipefunctionality differs from applyfor DataFrame objects, but not for GroupBy objects. Does anyone have an explanation or examples of what can be done with pipebut not with applyfor a GroupBy?

我可以看到pipe功能与applyDataFrame 对象有何不同,但不是 GroupBy 对象。有没有人对 GroupBy可以做什么pipe但不能做什么有解释或示例apply

回答by piRSquared

What pipedoes is to allow you to pass a callable with the expectation that the object that called pipeis the object that gets passed to the callable.

什么pipe是允许您传递一个可调用对象,并期望被调用pipe的对象是传递给可调用对象的对象。

With applywe assume that the object that calls applyhas subcomponents that will each get passed to the callable that was passed to apply. In the context of a groupbythe subcomponents are slices of the dataframe that called groupbywhere each slice is a dataframe itself. This is analogous for a series groupby.

随着apply我们假设对象调用apply具有将各获得传递给传递给可调用子apply。在 a 的上下文中groupby,子组件是称为数据帧的切片,groupby其中每个切片本身就是一个数据帧。这类似于系列groupby

The main difference between what you can do with a pipein a groupbycontext is that you have available to the callable the entire scope of the the groupbyobject. For apply, you only know about the local slice.

您可以pipegroupby上下文中使用 a 执行的操作之间的主要区别在于,您可以在groupby对象的整个范围内调用可调用对象。对于 apply,您只知道本地切片。

Setup
Consider df

设置
考虑df

df = pd.DataFrame(dict(
    A=list('XXXXYYYYYY'),
    B=range(10)
))

   A  B
0  X  0
1  X  1
2  X  2
3  X  3
4  Y  4
5  Y  5
6  Y  6
7  Y  7
8  Y  8
9  Y  9

Example 1
Make the entire 'B'column sum to 1while each sub-group sums to the same amount. This requires that the calculation be aware of how many groups exist. This is something we can't do with applybecause applywouldn't know how many groups exist.

示例 1
使整个'B'列的总和为,1而每个子组的总和为相同的数量。这要求计算知道存在多少组。这是我们不能做的事情,apply因为apply不知道存在多少组。

s = df.groupby('A').B.pipe(lambda g: df.B / g.transform('sum') / g.ngroups)
s

0    0.000000
1    0.083333
2    0.166667
3    0.250000
4    0.051282
5    0.064103
6    0.076923
7    0.089744
8    0.102564
9    0.115385
Name: B, dtype: float64

Note:

笔记:

s.sum()

0.99999999999999989

And:

和:

s.groupby(df.A).sum()

A
X    0.5
Y    0.5
Name: B, dtype: float64


Example 2
Subtract the mean of one group from the values of another. Again, this can't be done with applybecause applydoesn't know about other groups.

示例 2
用一组的平均值减去另一组的平均值。同样,这无法完成,apply因为apply不知道其他组。

df.groupby('A').B.pipe(
    lambda g: (
        g.get_group('X') - g.get_group('Y').mean()
    ).append(
        g.get_group('Y') - g.get_group('X').mean()
    )
)

0   -6.5
1   -5.5
2   -4.5
3   -3.5
4    2.5
5    3.5
6    4.5
7    5.5
8    6.5
9    7.5
Name: B, dtype: float64