pandas 计算大于pandas groupby中一个值的项目

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40710811/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:28:18  来源:igfitidea点击:

Count items greater than a value in pandas groupby

pythonpython-3.xpandas

提问by rookie

I have the Yelp dataset and I want to count all reviews which have greater than 3 stars. I get the count of reviews by doing this:

我有 Yelp 数据集,我想计算所有超过 3 星的评论。我通过这样做来获得评论数:

reviews.groupby('business_id')['stars'].count()

Now I want to get the count of reviews which had more than 3 stars, so I tried this by taking inspiration from here:

现在我想获得超过 3 星的评论数,所以我从这里获得灵感来尝试这个:

reviews.groupby('business_id')['stars'].agg({'greater':lambda val: (val > 3).count()})

But this just gives me the count of all stars like before. I am not sure if this is the right way to do it? What am I doing incorrectly here. Does the lambda expression not go through each value of the stars column?

但这只是让我像以前一样计算所有星星的数量。我不确定这是否是正确的方法?我在这里做错了什么。lambda 表达式是否不经过星列的每个值?

EDIT: Okay I feel stupid. I should have used the sum function instead of count to get the value of elements greater than 3, like this:

编辑:好吧,我觉得自己很愚蠢。我应该使用 sum 函数而不是 count 来获取大于 3 的元素的值,如下所示:

reviews.groupby('business_id')['stars'].agg({'greater':lambda val: (val > 3).sum()})

采纳答案by Mohamed AL ANI

You can try to do :

你可以尝试这样做:

reviews[reviews['stars'] > 3].groupby('business_id')['stars'].count()

回答by Esben Eickhardt

As I also wanted to rename the column and to run multiple functions on the same column, I came up with the following solution:

由于我还想重命名列并在同一列上运行多个函数,因此我想出了以下解决方案:

# Counting both over and under
reviews.groupby('business_id')\
       .agg(over=pandas.NamedAgg(column='stars', aggfunc=lambda x: (x > 3).sum()), 
            under=pandas.NamedAgg(column='stars', aggfunc=lambda x: (x < 3).sum()))\
       .reset_index()

The pandas.NamedAggallows you to create multiple new columns now that the functionality was removed in never versions of pandas.

pandas.NamedAgg让你现在的功能是在从未版本大Pandas的去除,以创建多个新列。

回答by Ivan P.

A bit late, but my solution is:

有点晚了,但我的解决方案是:

reviews.groupby('business_id').stars.apply(lambda x: len(x[x>3]) )

I came across this thread in search of finding "what is the fraction of values above X in a given GroupBy". Here is the solution if anyone is interested:

我遇到这个线程是为了寻找“给定 GroupBy 中高于 X 的值的分数是多少”。如果有人感兴趣,这是解决方案:

reviews.groupby('business_id').stars.apply(lambda x: len(x[x>3])/len(x) )

回答by Jonny Brooks

I quite like using method chaining with Pandasas I find it easier to read. I haven't tried it but I think this should also work

我非常喜欢在 Pandas 中使用方法链,因为我发现它更易于阅读。我还没有尝试过,但我认为这也应该有效

reviews.query("stars > 3").groupby("business_id").size()