Pandas:按满足条件的列分组

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/50662469/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:38:24  来源:igfitidea点击:

Pandas: Group by a column that meets a condition

pythonpandasdataframegroup-bypandas-groupby

提问by seisgradox

I have a data set with three colums: rating , breed, and dog.

我有一个包含三列的数据集: rating 、breed 和 dog。

import pandas as pd
dogs = {'breed': ['Chihuahua', 'Chihuahua', 'Dalmatian', 'Sphynx'],
        'dog': [True, True, True, False],
        'rating': [8.0, 9.0, 10.0, 7.0]}

df = pd.DataFrame(data=dogs)

I would like to calculate the meanrating per breed where dog is True. This would be the expected:

我想计算每个品种的平均评分,其中狗是真的。这将是预期的:

  breed     rating
0 Chihuahua 8.5   
1 Dalmatian 10.0  

This has been my attempt:

这是我的尝试:

df.groupby('breed')['rating'].mean().where(dog == True)

And this is the error that I get:

这是我得到的错误:

NameError: name 'dog' is not defined

But when I try add the wherecondition I only get errors. Can anyone advise a solution? TIA

但是当我尝试添加where条件时,我只会得到错误。任何人都可以建议解决方案吗?TIA

采纳答案by user3483203

Once you groupby and select a column, your dogcolumn doesn't exist anymore in the context you have selected (and even if it did you are not accessing it correctly).

一旦您分组并选择一列,您的dog列将不再存在于您选择的上下文中(即使存在,您也没有正确访问它)。

Filter your dataframe first, thenuse groupbywith mean

第一过滤您的数据帧,然后groupbymean

df[df.dog].groupby('breed')['rating'].mean().reset_index()

       breed  rating
0  Chihuahua     8.5
1  Dalmatian    10.0

回答by jpp

An alternative solution is to make dogone of your grouper keys. Then filter by dogin a separate step. This is more efficient if you do not want to lose aggregated data for non-dogs.

另一种解决方案是制作dog您的石斑鱼钥匙之一。然后dog在单独的步骤中过滤。如果您不想丢失非狗的聚合数据,这会更有效。

res = df.groupby(['dog', 'breed'])['rating'].mean().reset_index()

print(res)

     dog      breed  rating
0  False     Sphynx     7.0
1   True  Chihuahua     8.5
2   True  Dalmatian    10.0

print(res[res['dog']])

    dog      breed  rating
1  True  Chihuahua     8.5
2  True  Dalmatian    10.0