pandas 在 DataFrame 中嵌套 groupby 并聚合多列

Question

提问by Rahul

I am trying to do nested groupby as follows:

我正在尝试按如下方式进行嵌套 groupby：

>>> df1 = pd.DataFrame({'Date': {0: '2016-10-11', 1: '2016-10-11', 2: '2016-10-11', 3: '2016-10-11', 4: '2016-10-11',5: '2016-10-12'}, 'Stock': {0: 'ABC', 1: 'ABC', 2: 'ABC', 3: 'ABC', 4: 'ABC', 5: 'XYZ'}, 'Quantity': {0: 60,1: 50, 2: 40, 3: 30, 4: 20, 5: 10}, 'UiD':{0:1,1:1,2:1,3:2,4:2,5:3}, 'StartTime': {0: '08:00:00.241', 1: '08:00:00.243', 2: '12:34:23.563', 3: '08:14.05.908', 4: '18:54:50.100', 5: '10:08:36.657'}, 'Sign':{0:1,1:1,2:0,3:-1,4:0,5:-1}, 'leg1':{0:2,1:2,2:4,3:5,4:7,5:8}})
>>> df1
         Date  Quantity  Sign     StartTime Stock  UiD  leg1
0  2016-10-11        60     1  08:00:00.241   ABC    1     2
1  2016-10-11        50     1  08:00:00.243   ABC    1     2
2  2016-10-11        40     0  12:34:23.563   ABC    1     4
3  2016-10-11        30    -1  08:14.05.908   ABC    2     5
4  2016-10-11        20     0  18:54:50.100   ABC    2     7
5  2016-10-12        10    -1  10:08:36.657   XYZ    3     8
>>> dfg1=df1.groupby(['Date','Stock'])
>>> dfg1.apply(lambda x:x.groupby('UiD').first()).groupby(['Date','Stock']).apply(lambda x:np.sum(x['Quantity']))
Date        Stock
2016-10-11  ABC      90
2016-10-12  XYZ      10
dtype: int64
>>>
>>> dfg1['leg1'].sum()
Date        Stock
2016-10-11  ABC      20
2016-10-12  XYZ       8
Name: leg1, dtype: int64

So far so good. Now I am trying to concatenate the two results into a new DataFrame df2as follows:

到现在为止还挺好。现在我试图将两个结果连接到一个新的 DataFrame 中df2，如下所示：

>>> df2 = pd.concat([dfg1['leg1'].sum(), dfg1.apply(lambda x:x.groupby('UiD').first()).groupby(['Date','Stock']).apply(lambda x:np.sum(x['Quantity']))],axis=1)
                   0   1
Date       Stock        
2016-10-11 ABC    20  90
2016-10-12 XYZ     8  10
>>>

I am wondering if there is a better way to re-write following line in order to avoid repetition of groupby(['Date','Stock'])

我想知道是否有更好的方法来重写以下行以避免重复 groupby(['Date','Stock'])

dfg1.apply(lambda x:x.groupby('UiD').first()).groupby(['Date','Stock']).apply(lambda x:np.sum(x['Quantity']))

Also this fails if ['Date','Stock']contains 'UiD'as one of the keys or if ['Date','Stock']is replaced by just ['UiD'].

如果['Date','Stock']contains'UiD'作为键之一或者['Date','Stock']被替换为 just ，这也会失败['UiD']。

Answer 1

回答by smci

Please restate your question to be clearer. You want to groupby(['Date','Stock']), then:

请重申您的问题以使其更清楚。你想groupby(['Date','Stock'])，那么：

take only the first record for each UiD and sum (aggregate) its Quantity, but also
sum allleg1 values for that Date,Stock combination (not just the first-for-each-UiD). Is that right?

只取每个 UiD 的第一条记录并总结（聚合）它的数量，但也
为该 Date,Stock 组合的所有leg1 值求和（不仅仅是每个 UiD 的第一个值）。那正确吗？

Anyway you want to perform an aggregation (sum) on multiple columns, and yeah the way to avoid repetition of groupby(['Date','Stock']) is to keep one dataframe, not try to stitch together two dataframes from two individual aggregate operations. Something like the following (I'll fix it once you confirm this is what you want):

无论如何，您想对多列执行聚合（求和），是的，避免重复 groupby(['Date','Stock']) 的方法是保留一个数据帧，而不是尝试将两个数据帧拼接在一起聚合操作。类似于以下内容（一旦您确认这是您想要的，我就会修复它）：

def filter_first_UiD(g):
    #return g.groupby('UiD').first().agg(np.sum)
    return g.groupby('UiD').first().agg({'Quantity':'sum', 'leg1':'sum'})

df1.groupby(['Date','Stock']).apply(filter_first_UiD)

Answer 2

回答by Rahul

The way I dealt with the last scenario of avoiding groupby to fail if ['Date','Stock']contains 'UiD'as one of the keys or if ['Date','Stock']is replaced by just ['UiD']is as follows:

我处理避免 groupby 失败的最后一种情况的方式，如果['Date','Stock']包含'UiD'作为键之一或 if['Date','Stock']被 just 替换，['UiD']如下所示：

>>> df2 = pd.concat([dfg1['leg1'].sum(), dfg1[].first() if 'UiD' in `['Date','Stock']` else dfg1.apply(lambda x:x.groupby('UiD').first()).groupby(['Date','Stock']).apply(lambda x:np.sum(x['Quantity']))],axis=1)

But more elegant solution is still an open question.

但更优雅的解决方案仍然是一个悬而未决的问题。

pandas 在 DataFrame 中嵌套 groupby 并聚合多列

提问by Rahul

回答by smci

回答by Rahul

相关推荐

最近更新

标签

pandas 在 DataFrame 中嵌套 groupby 并聚合多列

提问by Rahul

回答by smci

回答by Rahul

相关推荐

pandas 将 .DAT 文件导入熊猫数据框

Pandas：返回一列值大于另一列值的数据框

将 Pandas DataFrame 转换为 Spark DataFrame

pandas 如何在seaborn中使用'hue'参数绘制关节图

相关推荐

最近更新

标签