带 Groupby 的 Python Pandas 条件求和

Question

提问by AllenQ

Using sample data:

使用样本数据：

df = pd.DataFrame({'key1' : ['a','a','b','b','a'],
               'key2' : ['one', 'two', 'one', 'two', 'one'],
               'data1' : np.random.randn(5),
               'data2' : np. random.randn(5)})

df

    data1        data2     key1  key2
0    0.361601    0.375297    a   one
1    0.069889    0.809772    a   two
2    1.468194    0.272929    b   one
3   -1.138458    0.865060    b   two
4   -0.268210    1.250340    a   one

I'm trying to figure out how to group the data by key1 and sum only the data1 values where key2 equals 'one'.

我试图弄清楚如何按 key1 对数据进行分组，并仅对 key2 等于“一”的 data1 值求和。

Here's what I've tried

这是我尝试过的

def f(d,a,b):
    d.ix[d[a] == b, 'data1'].sum()

df.groupby(['key1']).apply(f, a = 'key2', b = 'one').reset_index()

But this gives me a dataframe with 'None' values

但这给了我一个带有“无”值的数据框

index   key1    0
0       a       None
1       b       None

Any ideas here? I'm looking for the Pandas equivalent of the following SQL:

这里有什么想法吗？我正在寻找与以下 SQL 等效的 Pandas：

SELECT Key1, SUM(CASE WHEN Key2 = 'one' then data1 else 0 end)
FROM df
GROUP BY key1

FYI - I've seen conditional sums for pandas aggregatebut couldn't transform the answer provided there to work with sums rather than counts.

仅供参考 - 我已经看到了 Pandas 聚合的条件总和，但无法转换那里提供的答案以使用总和而不是计数。

Thanks in advance

提前致谢

Answer 1

采纳答案by Andy Hayden

First groupby the key1 column:

第一个 groupby key1 列：

In [11]: g = df.groupby('key1')

and then for each group take the subDataFrame where key2 equals 'one' and sum the data1 column:

然后对于每个组取 subDataFrame 其中 key2 等于 'one' 并对 data1 列求和：

In [12]: g.apply(lambda x: x[x['key2'] == 'one']['data1'].sum())
Out[12]:
key1
a       0.093391
b       1.468194
dtype: float64

To explain what's going on let's look at the 'a' group:

为了解释发生了什么，让我们看看“a”组：

In [21]: a = g.get_group('a')

In [22]: a
Out[22]:
      data1     data2 key1 key2
0  0.361601  0.375297    a  one
1  0.069889  0.809772    a  two
4 -0.268210  1.250340    a  one

In [23]: a[a['key2'] == 'one']
Out[23]:
      data1     data2 key1 key2
0  0.361601  0.375297    a  one
4 -0.268210  1.250340    a  one

In [24]: a[a['key2'] == 'one']['data1']
Out[24]:
0    0.361601
4   -0.268210
Name: data1, dtype: float64

In [25]: a[a['key2'] == 'one']['data1'].sum()
Out[25]: 0.093391000000000002

It may be slightly easier/clearer to do this by restricting the dataframe to just those with key2 equals one first:

通过将数据帧限制为 key2 首先等于 1 的数据帧，这样做可能会更容易/更清晰：

In [31]: df1 = df[df['key2'] == 'one']

In [32]: df1
Out[32]:
      data1     data2 key1 key2
0  0.361601  0.375297    a  one
2  1.468194  0.272929    b  one
4 -0.268210  1.250340    a  one

In [33]: df1.groupby('key1')['data1'].sum()
Out[33]:
key1
a       0.093391
b       1.468194
Name: data1, dtype: float64

Answer 2

回答by Diego

I think that today with pandas 0.23 you can do this:

我认为今天使用 pandas 0.23 你可以做到这一点：

import numpy as np

 df.assign(result = np.where(df['key2']=='one',df.data1,0))\
   .groupby('key1').agg({'result':sum})

The advantage of this is that you can apply it to more than one column of the same dataframe

这样做的好处是您可以将其应用于同一数据帧的多列

df.assign(
 result1 = np.where(df['key2']=='one',df.data1,0),
 result2 = np.where(df['key2']=='two',df.data1,0)
  ).groupby('key1').agg({'result1':sum, 'result2':sum})

Answer 3

回答by jpp

You can filter your dataframe beforeyou perform your groupbyoperation. If this reduces your series index due to all values being out-of-scope, you can use reindexwith fillna:

您可以在执行groupby操作之前过滤数据框。如果由于所有值超出范围而减少了您的系列索引，您可以使用reindexwith fillna：

res = df.loc[df['key2'].eq('one')]\
        .groupby('key1')['data1'].sum()\
        .reindex(df['key1'].unique()).fillna(0)

print(res)

key1
a    3.631610
b    0.978738
c    0.000000
Name: data1, dtype: float64

Setup

设置

I have added an additional row for demonstration purposes.

为了演示目的，我添加了一个额外的行。

np.random.seed(0)

df = pd.DataFrame({'key1': ['a','a','b','b','a','c'],
                   'key2': ['one', 'two', 'one', 'two', 'one', 'two'],
                   'data1': np.random.randn(6),
                   'data2': np.random.randn(6)})

带 Groupby 的 Python Pandas 条件求和

提问by AllenQ

采纳答案by Andy Hayden

回答by Diego

回答by jpp

Setup

设置

相关推荐

最近更新

标签

带 Groupby 的 Python Pandas 条件求和

提问by AllenQ

采纳答案by Andy Hayden

回答by Diego

回答by jpp

Setup

设置

相关推荐

选择特定的 CSV 列（过滤）- Python/pandas

在 Python 中执行刷新时如何防止 BrokenPipeError？

如何从 IDLE 交互式 shell 运行 python 脚本？

Python “无法将‘float’对象隐式转换为str”

相关推荐

最近更新

标签