pandas 计算大于pandas groupby中一个值的项目
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40710811/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Count items greater than a value in pandas groupby
提问by rookie
I have the Yelp dataset and I want to count all reviews which have greater than 3 stars. I get the count of reviews by doing this:
我有 Yelp 数据集,我想计算所有超过 3 星的评论。我通过这样做来获得评论数:
reviews.groupby('business_id')['stars'].count()
Now I want to get the count of reviews which had more than 3 stars, so I tried this by taking inspiration from here:
现在我想获得超过 3 星的评论数,所以我从这里获得灵感来尝试这个:
reviews.groupby('business_id')['stars'].agg({'greater':lambda val: (val > 3).count()})
But this just gives me the count of all stars like before. I am not sure if this is the right way to do it? What am I doing incorrectly here. Does the lambda expression not go through each value of the stars column?
但这只是让我像以前一样计算所有星星的数量。我不确定这是否是正确的方法?我在这里做错了什么。lambda 表达式是否不经过星列的每个值?
EDIT: Okay I feel stupid. I should have used the sum function instead of count to get the value of elements greater than 3, like this:
编辑:好吧,我觉得自己很愚蠢。我应该使用 sum 函数而不是 count 来获取大于 3 的元素的值,如下所示:
reviews.groupby('business_id')['stars'].agg({'greater':lambda val: (val > 3).sum()})
采纳答案by Mohamed AL ANI
You can try to do :
你可以尝试这样做:
reviews[reviews['stars'] > 3].groupby('business_id')['stars'].count()
回答by Esben Eickhardt
As I also wanted to rename the column and to run multiple functions on the same column, I came up with the following solution:
由于我还想重命名列并在同一列上运行多个函数,因此我想出了以下解决方案:
# Counting both over and under
reviews.groupby('business_id')\
.agg(over=pandas.NamedAgg(column='stars', aggfunc=lambda x: (x > 3).sum()),
under=pandas.NamedAgg(column='stars', aggfunc=lambda x: (x < 3).sum()))\
.reset_index()
The pandas.NamedAggallows you to create multiple new columns now that the functionality was removed in never versions of pandas.
该pandas.NamedAgg让你现在的功能是在从未版本大Pandas的去除,以创建多个新列。
回答by Ivan P.
A bit late, but my solution is:
有点晚了,但我的解决方案是:
reviews.groupby('business_id').stars.apply(lambda x: len(x[x>3]) )
I came across this thread in search of finding "what is the fraction of values above X in a given GroupBy". Here is the solution if anyone is interested:
我遇到这个线程是为了寻找“给定 GroupBy 中高于 X 的值的分数是多少”。如果有人感兴趣,这是解决方案:
reviews.groupby('business_id').stars.apply(lambda x: len(x[x>3])/len(x) )
回答by Jonny Brooks
I quite like using method chaining with Pandasas I find it easier to read. I haven't tried it but I think this should also work
我非常喜欢在 Pandas 中使用方法链,因为我发现它更易于阅读。我还没有尝试过,但我认为这也应该有效
reviews.query("stars > 3").groupby("business_id").size()