pandas 计算大于pandas groupby中一个值的项目

Question

提问by rookie

I have the Yelp dataset and I want to count all reviews which have greater than 3 stars. I get the count of reviews by doing this:

我有 Yelp 数据集，我想计算所有超过 3 星的评论。我通过这样做来获得评论数：

reviews.groupby('business_id')['stars'].count()

Now I want to get the count of reviews which had more than 3 stars, so I tried this by taking inspiration from here:

现在我想获得超过 3 星的评论数，所以我从这里获得灵感来尝试这个：

reviews.groupby('business_id')['stars'].agg({'greater':lambda val: (val > 3).count()})

But this just gives me the count of all stars like before. I am not sure if this is the right way to do it? What am I doing incorrectly here. Does the lambda expression not go through each value of the stars column?

但这只是让我像以前一样计算所有星星的数量。我不确定这是否是正确的方法？我在这里做错了什么。lambda 表达式是否不经过星列的每个值？

EDIT: Okay I feel stupid. I should have used the sum function instead of count to get the value of elements greater than 3, like this:

编辑：好吧，我觉得自己很愚蠢。我应该使用 sum 函数而不是 count 来获取大于 3 的元素的值，如下所示：

reviews.groupby('business_id')['stars'].agg({'greater':lambda val: (val > 3).sum()})

Answer 1

采纳答案by Mohamed AL ANI

You can try to do :

你可以尝试这样做：

reviews[reviews['stars'] > 3].groupby('business_id')['stars'].count()

Answer 2

回答by Esben Eickhardt

As I also wanted to rename the column and to run multiple functions on the same column, I came up with the following solution:

由于我还想重命名列并在同一列上运行多个函数，因此我想出了以下解决方案：

# Counting both over and under
reviews.groupby('business_id')\
       .agg(over=pandas.NamedAgg(column='stars', aggfunc=lambda x: (x > 3).sum()), 
            under=pandas.NamedAgg(column='stars', aggfunc=lambda x: (x < 3).sum()))\
       .reset_index()

The pandas.NamedAggallows you to create multiple new columns now that the functionality was removed in never versions of pandas.

该pandas.NamedAgg让你现在的功能是在从未版本大Pandas的去除，以创建多个新列。

Answer 3

回答by Ivan P.

A bit late, but my solution is:

有点晚了，但我的解决方案是：

reviews.groupby('business_id').stars.apply(lambda x: len(x[x>3]) )

I came across this thread in search of finding "what is the fraction of values above X in a given GroupBy". Here is the solution if anyone is interested:

我遇到这个线程是为了寻找“给定 GroupBy 中高于 X 的值的分数是多少”。如果有人感兴趣，这是解决方案：

reviews.groupby('business_id').stars.apply(lambda x: len(x[x>3])/len(x) )

Answer 4

回答by Jonny Brooks

I quite like using method chaining with Pandasas I find it easier to read. I haven't tried it but I think this should also work

我非常喜欢在 Pandas 中使用方法链，因为我发现它更易于阅读。我还没有尝试过，但我认为这也应该有效

reviews.query("stars > 3").groupby("business_id").size()

pandas 计算大于pandas groupby中一个值的项目

提问by rookie

采纳答案by Mohamed AL ANI

回答by Esben Eickhardt

回答by Ivan P.

回答by Jonny Brooks

相关推荐

最近更新

标签

pandas 计算大于pandas groupby中一个值的项目

提问by rookie

采纳答案by Mohamed AL ANI

回答by Esben Eickhardt

回答by Ivan P.

回答by Jonny Brooks

相关推荐

Pandas Dataframe 到 HTML 删除索引

pandas - 数据框中出现的唯一行数

Pandas：按两列分组以获得另一列的总和

pandas Python，将数据框中的每日数据汇总为每月和每季度

相关推荐

最近更新

标签