pandas groupby 统计列上的字符串出现次数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31649669/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas groupby count string occurrence over column
提问by beta
I want to count the occurrence of a string in a grouped pandas dataframe column.
我想计算一个字符串在分组的 Pandas 数据框列中的出现次数。
Assume I have the following Dataframe:
假设我有以下数据框:
catA catB scores
A X 6-4 RET
A X 6-4 6-4
A Y 6-3 RET
B Z 6-0 RET
B Z 6-1 RET
First, I want to group by catAand catB. And for each of these groups I want to count the occurrence of RETin the scorescolumn.
首先,我想按catA和分组catB。而对于每个组的我要算发生RET在scores列。
The result should look something like this:
结果应该是这样的:
catA catB RET
A X 1
A Y 1
B Z 2
The grouping by two columns is easy: grouped = df.groupby(['catA', 'catB'])
按两列分组很容易: grouped = df.groupby(['catA', 'catB'])
But what's next?
但是接下来呢?
回答by EdChum
Call applyon the 'scores' column on the groupbyobject and use the vectorise strmethod contains, use this to filter the groupand call count:
调用对象apply上的“分数”列groupby并使用 vectorisestr方法contains,使用它来过滤group和调用count:
In [34]:
df.groupby(['catA', 'catB'])['scores'].apply(lambda x: x[x.str.contains('RET')].count())
Out[34]:
catA catB
A X 1
Y 1
B Z 2
Name: scores, dtype: int64
To assign as a column use transformso that the aggregation returns a series with it's index aligned to the original df:
要分配为列使用,transform以便聚合返回一个系列,它的索引与原始 df 对齐:
In [35]:
df['count'] = df.groupby(['catA', 'catB'])['scores'].transform(lambda x: x[x.str.contains('RET')].count())
df
Out[35]:
catA catB scores count
0 A X 6-4 RET 1
1 A X 6-4 6-4 1
2 A Y 6-3 RET 1
3 B Z 6-0 RET 2
4 B Z 6-1 RET 2

