Pandas:使用 groupby 获取每个数据类别的平均值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29314424/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:07:24  来源:igfitidea点击:

Pandas: using groupby to get mean for each data category

pythonpandasaggregatemean

提问by cahoy

I have a dataframe that looks like this:

我有一个看起来像这样的数据框:

>>> df[['data','category']]
Out[47]: 
          data     category
  0       4610            2
 15       4610            2
 22       5307            7
 23       5307            7
 25       5307            7
...        ...          ...

Both data and category are numeric so I'm able to do this:

数据和类别都是数字,所以我可以这样做:

>>> df[['data','category']].mean()
Out[48]: 
data        5894.677985
category      13.805886
dtype: float64

And i'm trying to get the mean for each category. It looks straight forward but when I do this:

我正在尝试获得每个类别的平均值。它看起来很直接,但是当我这样做时:

>>> df[['data','category']].groupby('category').mean()

or

或者

>>> df.groupby('category')['data'].mean()

It returns an error like this:

它返回这样的错误:

DataError: No numeric types to aggregate

There's no error if I replace both functions above with .count().

如果我将上面的两个函数都替换为.count().

What do I do wrongly? What's the correct way to get the mean of each category?

我做错了什么?获得每个类别均值的正确方法是什么?

回答by Amrita Sawant

Can you do a df.dtypes ? In the example below type is Int as it works fine.

你能做一个 df.dtypes 吗?在下面的示例中,类型是 Int,因为它工作正常。

    import pandas as pd

    ##group by 1 columns
    df = pd.DataFrame({' data': [4610, 4611, 4612, 4613], 'Category': [2, 2,    7, 7]})
    print df.groupby('Category'). mean()


    ##Mutiple columns to group by
    df1 = pd.DataFrame({' data': [4610, 4611, 4612, 4613], 'Category': [2,    2, 7, 7], 'Category2' : ['A','B','A','B']})
    key=['Category','Category2']
    print df1.groupby( key).mean()

 Category Category2       
 2        A           4610
          B           4611
 7        A           4612
          B           4613

回答by Alexander

As mentioned, you don't give an example of the testTime and passing_site data, but I'm guessing that they're floating rate numbers. As I'm sure you can imagine, you can't group on floating numbers. Rather, you would need to group on integers or categories of some type.

如前所述,您没有给出 testTime 和passing_site 数据的示例,但我猜它们是浮动利率数字。我相信你可以想象,你不能对浮点数进行分组。相反,您需要对整数或某种类型的类别进行分组。

try something like:

尝试类似:

df.groupby(['data', 'category'])['passing_site', 'testTime'].mean()

You're grouping on 'data' and 'category', and then calculating the mean for the numerical columns 'passing_site' and 'testTime'.

您正在对“数据”和“类别”进行分组,然后计算数字列“passing_site”和“testTime”的平均值。