Pandas：使用 groupby 获取每个数据类别的平均值

Question

提问by cahoy

I have a dataframe that looks like this:

我有一个看起来像这样的数据框：

>>> df[['data','category']]
Out[47]: 
          data     category
  0       4610            2
 15       4610            2
 22       5307            7
 23       5307            7
 25       5307            7
...        ...          ...

Both data and category are numeric so I'm able to do this:

数据和类别都是数字，所以我可以这样做：

>>> df[['data','category']].mean()
Out[48]: 
data        5894.677985
category      13.805886
dtype: float64

And i'm trying to get the mean for each category. It looks straight forward but when I do this:

我正在尝试获得每个类别的平均值。它看起来很直接，但是当我这样做时：

>>> df[['data','category']].groupby('category').mean()

or

或者

>>> df.groupby('category')['data'].mean()

It returns an error like this:

它返回这样的错误：

DataError: No numeric types to aggregate

There's no error if I replace both functions above with .count().

如果我将上面的两个函数都替换为.count().

What do I do wrongly? What's the correct way to get the mean of each category?

我做错了什么？获得每个类别均值的正确方法是什么？

Answer 1

回答by Amrita Sawant

Can you do a df.dtypes ? In the example below type is Int as it works fine.

你能做一个 df.dtypes 吗？在下面的示例中，类型是 Int，因为它工作正常。

    import pandas as pd

    ##group by 1 columns
    df = pd.DataFrame({' data': [4610, 4611, 4612, 4613], 'Category': [2, 2,    7, 7]})
    print df.groupby('Category'). mean()


    ##Mutiple columns to group by
    df1 = pd.DataFrame({' data': [4610, 4611, 4612, 4613], 'Category': [2,    2, 7, 7], 'Category2' : ['A','B','A','B']})
    key=['Category','Category2']
    print df1.groupby( key).mean()

 Category Category2       
 2        A           4610
          B           4611
 7        A           4612
          B           4613

Answer 2

回答by Alexander

As mentioned, you don't give an example of the testTime and passing_site data, but I'm guessing that they're floating rate numbers. As I'm sure you can imagine, you can't group on floating numbers. Rather, you would need to group on integers or categories of some type.

如前所述，您没有给出 testTime 和passing_site 数据的示例，但我猜它们是浮动利率数字。我相信你可以想象，你不能对浮点数进行分组。相反，您需要对整数或某种类型的类别进行分组。

try something like:

尝试类似：

df.groupby(['data', 'category'])['passing_site', 'testTime'].mean()

You're grouping on 'data' and 'category', and then calculating the mean for the numerical columns 'passing_site' and 'testTime'.

您正在对“数据”和“类别”进行分组，然后计算数字列“passing_site”和“testTime”的平均值。

Pandas：使用 groupby 获取每个数据类别的平均值

提问by cahoy

回答by Amrita Sawant

回答by Alexander

相关推荐

最近更新

标签

Pandas：使用 groupby 获取每个数据类别的平均值

提问by cahoy

回答by Amrita Sawant

回答by Alexander

相关推荐

如何从 Pandas 数据帧在 Matplotlib 热图中创建预定义的颜色范围

在 Pandas Dataframe 中保存其他属性

pandas 熊猫：使用 if-else 填充新列

pandas 使用 statsmodel 从 Python 中的 GLM 中提取系数

相关推荐

最近更新

标签