基于条件的 Python 熊猫数据框分组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31303417/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 09:47:01  来源:igfitidea点击:

Python pandas dataframe group by based on a condition

pythonpandasgroup-byconditional-statementsdataframe

提问by ahajib

My question is simple, I have a dataframe and I groupbythe results based on a column and get the size like this:

我的问题很简单,我有一个数据框,我groupby的结果基于一列并得到如下大小:

df.groupby('column').size()

Now the problem is that I only want the ones where size is greater than X. I am wondering if I can do it using a lambda function or anything similar? I have already tried this:

现在的问题是我只想要 size 大于X 的那些。我想知道我是否可以使用 lambda 函数或类似的函数来做到这一点?我已经试过了:

df.groupby('column').size() > X

and it prints out some True and False values.

它打印出一些 True 和 False 值。

采纳答案by Ami Tavory

The grouped result is a regular DataFrame, so just filter the results as usual:

分组的结果是一个普通的DataFrame,所以只需像往常一样过滤结果:

 import pandas as pd

 df = pd.DataFrame({'a': ['a', 'b', 'a', 'a', 'b', 'c', 'd']})
 after = df.groupby('a').size()
 >> after
 a
 a    3
 b    2
 c    1
 d    1
 dtype: int64

 >> after[after > 2]
 a
 a    3
 dtype: int64

回答by Jianxun Li

Try this code:

试试这个代码:

df.groupby('column').filter(lambda group: group.size > X)