pandas 熊猫聚合的条件总和

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15259547/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:41:15  来源:igfitidea点击:

conditional sums for pandas aggregate

pythonrpandasdata.table

提问by stites

I just recently made the switch from R to python and have been having some trouble getting used to data frames again as opposed to using R's data.table. The problem I've been having is that I'd like to take a list of strings, check for a value, then sum the count of that string- broken down by user. So I would like to take this data:

我最近才从 R 切换到 python,并且在重新习惯数据帧方面遇到了一些麻烦,而不是使用 R 的 data.table。我一直遇到的问题是我想获取一个字符串列表,检查一个值,然后对该字符串的计数求和 - 按用户细分。所以我想拿这个数据:

   A_id       B    C
1:   a1    "up"  100
2:   a2  "down"  102
3:   a3    "up"  100
3:   a3    "up"  250
4:   a4  "left"  100
5:   a5 "right"  102

And return:

并返回:

   A_id_grouped   sum_up   sum_down  ...  over_200_up
1:           a1        1          0  ...            0
2:           a2        0          1                 0
3:           a3        2          0  ...            1
4:           a4        0          0                 0
5:           a5        0          0  ...            0

Before I did it with the R code (using data.table)

在我使用 R 代码(使用 data.table)之前

>DT[ ,list(A_id_grouped, sum_up = sum(B == "up"),
+  sum_down = sum(B == "down"), 
+  ...,
+  over_200_up = sum(up == "up" & < 200), by=list(A)];

However all of my recent attempts with Python have failed me:

然而,我最近对 ​​Python 的所有尝试都失败了:

DT.agg({"D": [np.sum(DT[DT["B"]=="up"]),np.sum(DT[DT["B"]=="up"])], ...
    "C": np.sum(DT[(DT["B"]=="up") & (DT["C"]>200)])
    })

Thank you in advance! it seems like a simple question however I couldn't find it anywhere.

先感谢您!这似乎是一个简单的问题,但我在任何地方都找不到。

采纳答案by Garrett

To complement unutbu's answer, here's an approach using applyon the groupby object.

为了补充 unutbu 的答案,这里有一种apply在 groupby 对象上使用的方法。

>>> df.groupby('A_id').apply(lambda x: pd.Series(dict(
    sum_up=(x.B == 'up').sum(),
    sum_down=(x.B == 'down').sum(),
    over_200_up=((x.B == 'up') & (x.C > 200)).sum()
)))
      over_200_up  sum_down  sum_up
A_id                               
a1              0         0       1
a2              0         1       0
a3              1         0       2
a4              0         0       0
a5              0         0       0

回答by unutbu

There might be a better way; I'm pretty new to pandas, but this works:

可能有更好的方法;我对Pandas很陌生,但这有效:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A_id':'a1 a2 a3 a3 a4 a5'.split(),
                   'B': 'up down up up left right'.split(),
                   'C': [100, 102, 100, 250, 100, 102]})

df['D'] = (df['B']=='up') & (df['C'] > 200)
grouped = df.groupby(['A_id'])

def sum_up(grp):
    return np.sum(grp=='up')
def sum_down(grp):
    return np.sum(grp=='down')
def over_200_up(grp):
    return np.sum(grp)

result = grouped.agg({'B': [sum_up, sum_down],
                      'D': [over_200_up]})
result.columns = [col[1] for col in result.columns]
print(result)

yields

产量

      sum_up  sum_down  over_200_up
A_id                               
a1         1         0            0
a2         0         1            0
a3         2         0            1
a4         0         0            0
a5         0         0            0