Pandas DataFrame 删除 groupby 中的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42966813/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:15:56  来源:igfitidea点击:

Pandas DataFrame to drop rows in the groupby

pythonpandasdataframe

提问by Zed Fang

I have a DataFrame with three columns Date, Advertiserand ID. I grouped the data firsts to see if volumns of some Advertisers are too small (For example when count()less than 500). And then I want to drop those rows in the group table.

我有三列的数据帧DateAdvertiser和ID。我首先对数据进行分组,以查看某些广告商的数量是否太小(例如count()小于 500 时)。然后我想删除组表中的那些行。

df.groupby(['Date','Advertiser']).ID.count()

The result likes this:

结果是这样的:

 Date         Advertiser
 2016-01        A             50000
                B               50
                C              4000
                D             24000
 2016-02        A              6800
                B              7800
                C               123
 2016-03        B              1111
                E              8600
                F               500

I want a result to be this:

我希望结果是这样的:

 Date         Advertiser
 2016-01        A             50000
                C              4000
                D             24000
 2016-02        A              6800
                B              7800
 2016-03        B              1111
                E              8600

Followed up question:

后续问题:

How about if I want to filter out the rows in groupby in term of the total count()in date category. For example, I want to count()for a date larger than 15000. The table I want likes this:

如果我想根据count()日期类别中的总数过滤掉 groupby 中的行如何。例如,我想要count()一个大于 15000 的日期。我想要的表是这样的:

Date         Advertiser
 2016-01        A             50000
                B               50
                C              4000
                D             24000
 2016-02        A              6800
                B              7800
                C               123

采纳答案by Psidom

You have a Series object after the groupby, which can be filtered based on value with a chained lambdafilter:

在 之后有一个 Series 对象groupby,可以使用链式lambda过滤器根据值对其进行过滤:

df.groupby(['Date','Advertiser']).ID.count()[lambda x: x >= 500]

#Date     Advertiser
#2016-01  A             50000
#         C              4000
#         D             24000
#2016-02  A              6800
#         B              7800
#2016-03  B              1111
#         E              8600
#         F               500