Python Pandas groupby:获取一个组的大小,知道它的 id(来自 .grouper.group_info[0])

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17945247/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:33:03  来源:igfitidea点击:

Pandas groupby: get size of a group knowing its id (from .grouper.group_info[0])

pythongroup-bypandas

提问by piokuc

In the following snippet datais a pandas.DataFrameand indicesis a set of columns of the data. After grouping the data with groupbyI am interested in the ids of the groups, but only those with a size greater than a threshold (say: 3).

在下面的代码段中data是一个pandas.DataFrameindicesdata. 将数据分组后,groupby我对组的 id 感兴趣,但只对大小大于阈值的那些感兴趣(比如:3)。

group_ids=data.groupby(list(data.columns[list(indices)])).grouper.group_info[0]

Now, how can I find which group has a size greater than or equal 3 knowing the id of the group? I only want ids of groups with a certain size.

现在,我怎样才能知道哪个组的大小大于或等于 3 知道组的 id?我只想要具有特定大小的组的 ID。

#TODO: filter out ids from group_ids which correspond to groups with sizes < 3 

采纳答案by Andy Hayden

One way is to use the sizemethod of the groupby:

一种方法是使用 的size方法groupby

g = data.groupby(...)
size = g.size()
size[size > 3]

For example, here there is only one group of size > 1:

例如,这里只有一组大小 > 1:

In [11]: df = pd.DataFrame([[1, 2], [3, 4], [1,6]], columns=['A', 'B'])

In [12]: df
Out[12]:
   A  B
0  1  2
1  3  4
2  1  6 

In [13]: g = df.groupby('A')

In [14]: size = g.size()

In [15]: size[size > 1]
Out[15]:
A
1    2
dtype: int64

If you were interested in just restricting the DataFrame to those in large groups you could use the filtermethod:

如果您只想将 DataFrame 限制为大型组,则可以使用filter方法:

In [21]: g.filter(lambda x: len(x) > 1)
Out[21]:
   A  B
0  1  2
2  1  6