pandas 熊猫聚合的条件总和

Question

提问by stites

I just recently made the switch from R to python and have been having some trouble getting used to data frames again as opposed to using R's data.table. The problem I've been having is that I'd like to take a list of strings, check for a value, then sum the count of that string- broken down by user. So I would like to take this data:

我最近才从 R 切换到 python，并且在重新习惯数据帧方面遇到了一些麻烦，而不是使用 R 的 data.table。我一直遇到的问题是我想获取一个字符串列表，检查一个值，然后对该字符串的计数求和 - 按用户细分。所以我想拿这个数据：

   A_id       B    C
1:   a1    "up"  100
2:   a2  "down"  102
3:   a3    "up"  100
3:   a3    "up"  250
4:   a4  "left"  100
5:   a5 "right"  102

And return:

并返回：

   A_id_grouped   sum_up   sum_down  ...  over_200_up
1:           a1        1          0  ...            0
2:           a2        0          1                 0
3:           a3        2          0  ...            1
4:           a4        0          0                 0
5:           a5        0          0  ...            0

Before I did it with the R code (using data.table)

在我使用 R 代码（使用 data.table）之前

>DT[ ,list(A_id_grouped, sum_up = sum(B == "up"),
+  sum_down = sum(B == "down"), 
+  ...,
+  over_200_up = sum(up == "up" & < 200), by=list(A)];

However all of my recent attempts with Python have failed me:

然而，我最近对 Python 的所有尝试都失败了：

DT.agg({"D": [np.sum(DT[DT["B"]=="up"]),np.sum(DT[DT["B"]=="up"])], ...
    "C": np.sum(DT[(DT["B"]=="up") & (DT["C"]>200)])
    })

Thank you in advance! it seems like a simple question however I couldn't find it anywhere.

先感谢您！这似乎是一个简单的问题，但我在任何地方都找不到。

Answer 1

采纳答案by Garrett

To complement unutbu's answer, here's an approach using applyon the groupby object.

为了补充 unutbu 的答案，这里有一种apply在 groupby 对象上使用的方法。

>>> df.groupby('A_id').apply(lambda x: pd.Series(dict(
    sum_up=(x.B == 'up').sum(),
    sum_down=(x.B == 'down').sum(),
    over_200_up=((x.B == 'up') & (x.C > 200)).sum()
)))
      over_200_up  sum_down  sum_up
A_id                               
a1              0         0       1
a2              0         1       0
a3              1         0       2
a4              0         0       0
a5              0         0       0

Answer 2

回答by unutbu

There might be a better way; I'm pretty new to pandas, but this works:

可能有更好的方法；我对Pandas很陌生，但这有效：

import pandas as pd
import numpy as np

df = pd.DataFrame({'A_id':'a1 a2 a3 a3 a4 a5'.split(),
                   'B': 'up down up up left right'.split(),
                   'C': [100, 102, 100, 250, 100, 102]})

df['D'] = (df['B']=='up') & (df['C'] > 200)
grouped = df.groupby(['A_id'])

def sum_up(grp):
    return np.sum(grp=='up')
def sum_down(grp):
    return np.sum(grp=='down')
def over_200_up(grp):
    return np.sum(grp)

result = grouped.agg({'B': [sum_up, sum_down],
                      'D': [over_200_up]})
result.columns = [col[1] for col in result.columns]
print(result)

yields

产量

      sum_up  sum_down  over_200_up
A_id                               
a1         1         0            0
a2         0         1            0
a3         2         0            1
a4         0         0            0
a5         0         0            0

pandas 熊猫聚合的条件总和

提问by stites

采纳答案by Garrett

回答by unutbu

相关推荐

最近更新

标签

pandas 熊猫聚合的条件总和

提问by stites

采纳答案by Garrett

回答by unutbu

相关推荐

滑动窗口上的 Pandas 滚动计算（不均匀间隔）

pandas 熊猫面板中的布尔掩码

Pandas 未记录的 DataFrame.keys() 方法

Pandas 在布尔索引中使用行标签

相关推荐

最近更新

标签