pandas groupby 统计列上的字符串出现次数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31649669/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:40:59  来源:igfitidea点击:

pandas groupby count string occurrence over column

pythonpandascountgroup-bydataframe

提问by beta

I want to count the occurrence of a string in a grouped pandas dataframe column.

我想计算一个字符串在分组的 Pandas 数据框列中的出现次数。

Assume I have the following Dataframe:

假设我有以下数据框:

catA    catB    scores
A       X       6-4 RET
A       X       6-4 6-4
A       Y       6-3 RET
B       Z       6-0 RET
B       Z       6-1 RET

First, I want to group by catAand catB. And for each of these groups I want to count the occurrence of RETin the scorescolumn.

首先,我想按catA和分组catB。而对于每个组的我要算发生RETscores列。

The result should look something like this:

结果应该是这样的:

catA    catB    RET
A       X       1
A       Y       1
B       Z       2

The grouping by two columns is easy: grouped = df.groupby(['catA', 'catB'])

按两列分组很容易: grouped = df.groupby(['catA', 'catB'])

But what's next?

但是接下来呢?

回答by EdChum

Call applyon the 'scores' column on the groupbyobject and use the vectorise strmethod contains, use this to filter the groupand call count:

调用对象apply上的“分数”列groupby并使用 vectorisestr方法contains,使用它来过滤group和调用count

In [34]:    
df.groupby(['catA', 'catB'])['scores'].apply(lambda x: x[x.str.contains('RET')].count())

Out[34]:
catA  catB
A     X       1
      Y       1
B     Z       2
Name: scores, dtype: int64

To assign as a column use transformso that the aggregation returns a series with it's index aligned to the original df:

要分配为列使用,transform以便聚合返回一个系列,它的索引与原始 df 对齐:

In [35]:
df['count'] = df.groupby(['catA', 'catB'])['scores'].transform(lambda x: x[x.str.contains('RET')].count())
df

Out[35]:
  catA catB   scores count
0    A    X  6-4 RET     1
1    A    X  6-4 6-4     1
2    A    Y  6-3 RET     1
3    B    Z  6-0 RET     2
4    B    Z  6-1 RET     2