Python 如何在 Pandas Dataframe 上的 groupby 之后进行条件计数？

Question

提问by Sethias

I have the following dataframe:

我有以下数据框：

   key1  key2
0    a   one
1    a   two
2    b   one
3    b   two
4    a   one
5    c   two

Now, I want to group the dataframe by the key1and count the column key2with the value "one"to get this result:

现在，我想按对数据框进行分组key1并计算key2具有该值的列"one"以获得此结果：

I just get the usual count with:

我只是得到通常的计数：

df.groupby(['key1']).size()

But I don't know how to insert the condition.

但我不知道如何插入条件。

I tried things like this:

我试过这样的事情：

df.groupby(['key1']).apply(df[df['key2'] == 'one'])

But I can't get any further. How can I do this?

但我不能再进一步了。我怎样才能做到这一点？

Answer 1

回答by jezrael

I think you need add condition first:

我认为您需要先添加条件：

#if need also category c with no values of 'one'
df11=df.groupby('key1')['key2'].apply(lambda x: (x=='one').sum()).reset_index(name='count')
print (df11)
  key1  count
0    a      2
1    b      1
2    c      0

Or use categoricalwith key1, then missing value is added by size:

或使用categoricalwith key1，然后通过size以下方式添加缺失值：

df['key1'] = df['key1'].astype('category')
df1 = df[df['key2'] == 'one'].groupby(['key1']).size().reset_index(name='count') 
print (df1)
  key1  count
0    a      2
1    b      1
2    c      0

If need all combinations:

如果需要所有组合：

df2 = df.groupby(['key1', 'key2']).size().reset_index(name='count') 
print (df2)
  key1 key2  count
0    a  one      2
1    a  two      1
2    b  one      1
3    b  two      1
4    c  two      1

df3 = df.groupby(['key1', 'key2']).size().unstack(fill_value=0)
print (df3)
key2  one  two
key1          
a       2    1
b       1    1
c       0    1

Answer 2

回答by Florian Mutel

You can count the occurence of 'one' for the groupby dataframe, in the column 'key2' like this: df.groupby('key1')['key2'].apply(lambda x: x[x == 'one'].count())

您可以在“key2”列中为 groupby 数据框计算“one”的出现次数，如下所示： df.groupby('key1')['key2'].apply(lambda x: x[x == 'one'].count())

yield

屈服

key1
a    2
b    1
c    0
Name: key2, dtype: int64

Answer 3

回答by piRSquared

Option 1

选项1

df.set_index('key1').key2.eq('one').sum(level=0).astype(int).reset_index()

  key1  key2
0    a     2
1    b     1
2    c     0

Option 2

选项 2

df.key2.eq('one').groupby(df.key1).sum().astype(int).reset_index()

  key1  key2
0    a     2
1    b     1
2    c     0

Option 3

选项 3

f, u = df.key1.factorize()
pd.DataFrame(dict(key1=u, key2=np.bincount(f, df.key2.eq('one')).astype(int)))

  key1  key2
0    a     2
1    b     1
2    c     0

Option 4

选项 4

pd.crosstab(df.key1, df.key2.eq('one'))[True].rename('key2').reset_index()

  key1  key2
0    a     2
1    b     1
2    c     0

Option 5

选项 5

pd.get_dummies(df.key1).mul(
   df.key2.eq('one'), 0
).sum().rename_axis('key1').reset_index(name='key2')

  key1  key2
0    a     2
1    b     1
2    c     0

Answer 4

回答by Mehdi Golari

You can do this with applying groupby() on both keys and unstack().

您可以通过在两个键和 unstack() 上应用 groupby() 来做到这一点。

df = df.groupby(['key1', 'key2']).size().unstack()

Answer 5

回答by Igor Ko?akowski

Maybe not the fastest solution, but you can create new data frame with column of ones if key2 is equal to 'one'.

也许不是最快的解决方案，但如果 key2 等于“一”，您可以创建带有一列的新数据框。

df2 = df.assign(oneCount =
 lambda x: [1 if row.key2 == 'one' else 0 for index, row in x.iterrows()])

  key1 key2  oneCount
0    a  one         1
1    a  two         0
2    b  one         1
3    b  two         0
4    a  one         1
5    c  two         0

And then aggregate it.

然后聚合它。

df3 = df2.groupby('key1').agg({"oneCount":sum}).reset_index()

 key1  oneCount
0    a         2
1    b         1
2    c         0

Answer 6

回答by Andre Vieira de Lima

I need count 2 columns (lambda with two arguments) as the example:

我需要计算 2 列（带有两个参数的 lambda）作为示例：

Pandas dataframe groupby func, in the column key2like this:

Pandas dataframe groupby func，在这样的列中key2：

df.groupby('key1')['key2'].apply(lambda x: x[x == 'one'].count())

Python 如何在 Pandas Dataframe 上的 groupby 之后进行条件计数？

提问by Sethias

回答by jezrael

回答by Florian Mutel

回答by piRSquared

回答by Mehdi Golari

回答by Igor Ko?akowski

回答by Andre Vieira de Lima

相关推荐

最近更新

标签

Python 如何在 Pandas Dataframe 上的 groupby 之后进行条件计数？

提问by Sethias

回答by jezrael

回答by Florian Mutel

回答by piRSquared

回答by Mehdi Golari

回答by Igor Ko?akowski

回答by Andre Vieira de Lima

相关推荐

Python pyspark 使用 partitionby 对数据进行分区

检查元素是否存在 python selenium

如何在python中合并多个数组？

Python scikit-learn 中的“fit”方法有什么作用？

相关推荐

最近更新

标签