pandas groupby 和 mean 之后的“没有要聚合的数字类型”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48171492/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:01:56  来源:igfitidea点击:

"No numeric types to aggregate" after groupby and mean

pythonpandas

提问by Harry

I'm dealing with time series and try to write function to calculation monthly average of data. Here are some function for prepare:

我正在处理时间序列并尝试编写函数来计算数据的月平均值。以下是一些准备功能:

import datetime
import numpy as numpy
def date_range_0(start,end):

    dates = [start + datetime.timedelta(days=i) 
            for i in range((end-start).days+1)]
    return numpy.array(dates)
def date_range_1(start,days):
    #days should be an interger

    return date_range_0(start,start+datetime.timedelta(days-1))

x=date_range_1(datetime.datetime(2015, 5, 17),4)

x, the output is a simple time list:

x,输出是一个简单的时间列表:

array([datetime.datetime(2015, 5, 17, 0, 0),
   datetime.datetime(2015, 5, 18, 0, 0),
   datetime.datetime(2015, 5, 19, 0, 0),
   datetime.datetime(2015, 5, 20, 0, 0)], dtype=object)

Then I learn groupby function from http://blog.csdn.net/youngbit007/article/details/54288603I have tried one example in website above and it works fine:

然后我从http://blog.csdn.net/youngbit007/article/details/54288603学习 groupby 函数 我在上面的网站上尝试了一个例子,它工作正常:

df = pandas.DataFrame({'key1':date_range_1(datetime.datetime(2015, 1, 17),5),
              'key2': [2015001,2015001,2015001,2015001,2015001],
              'data1': 1+0.1*numpy.arange(1,6)
        })
df

gives

   data1    key1    key2
0   1.1 2015-01-17  2015001
1   1.2 2015-01-18  2015001
2   1.3 2015-01-19  2015001
3   1.4 2015-01-20  2015001
4   1.5 2015-01-21  2015001

and

grouped=df['data1'].groupby(df['key2'])
grouped.mean()

gives

key2
2015001    0.2
Name: data1, dtype: float64

Then I try my own example:

然后我尝试我自己的例子:

datedat=numpy.array([date_range_1(datetime.datetime(2015, 1, 17),5),1+0.1*numpy.arange(1,6)]).T
months = [day.month for day in datedat[:,0]]
years = [day.year for day in datedat[:,0]]
datedatF = 
pandas.DataFrame({'key1':datedat[:,0],'key2':list((numpy.array(years)*1000 +numpy.array(months))),'data1':datedat[:,1]})
datedatF

which generated

其中产生

   data1    key1    key2
0   1.1 2015-01-17  2015001
1   1.2 2015-01-18  2015001
2   1.3 2015-01-19  2015001
3   1.4 2015-01-20  2015001
4   1.5 2015-01-21  2015001

Note this is exactly the very same table as above! so far so good. Then I run:

请注意,这与上面的表格完全相同!到目前为止,一切都很好。然后我运行:

grouped2=datedatF['data1'].groupby(datedatF['key2'])
grouped2.mean()

it throw out this:

它抛出这个:

   ---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-170-f0d2bc225b88> in <module>()
  1 grouped2=datedatF['data1'].groupby(datedatF['key2'])
----> 2 grouped2.mean()

/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in     mean(self, *args, **kwargs)
   1017         nv.validate_groupby_func('mean', args, kwargs)
   1018         try:
-> 1019             return self._cython_agg_general('mean')
   1020         except GroupByError:
   1021             raise

/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in     _cython_agg_general(self, how, numeric_only)
    806 
    807         if len(output) == 0:
--> 808             raise DataError('No numeric types to aggregate')
    809 
    810         return self._wrap_aggregated_output(output, names)

DataError: No numeric types to aggregate

ohh..what did I wrong?Why can't I mean the second pandas.DataFrame? It's completely same as the successful example!

哦..我做错了什么?为什么我的意思不是第二个pandas.DataFrame?和成功的例子完全一样!

回答by YOBEN_S

You data1 type in your df is object , we need adding pd.to_numeric

您在 df 中输入的 data1 是 object ,我们需要添加 pd.to_numeric

datedatF.dtypes
Out[39]: 
data1            object
key1     datetime64[ns]
key2              int64
dtype: object
grouped2=pd.to_numeric(datedatF['data1']).groupby(datedatF['key2'])
grouped2.mean()
Out[41]: 
key2
2015001    1.3
Name: data1, dtype: float64

回答by MaxU

your data1is of object(string) dtype:

data1object(字符串)dtype:

In [396]: datedatF.dtypes
Out[396]:
data1            object   # <--- NOTE!
key1     datetime64[ns]
key2              int64
dtype: object

so try this:

所以试试这个:

In [397]: datedatF.assign(data1=pd.to_numeric(datedatF['data1'], errors='coerce')) \
                  .groupby('key2')['data1'].mean()
Out[397]:
key2
2015001    1.3
Name: data1, dtype: float64

回答by sameer_nubia

Group by

通过...分组

dictgrp={'Company':'Goog Goog msft msft fb fb'.split(),
         'Person':'Sam Charlie amy vanessa carl sarah'.split(),
         'Sales':'200 130 340 124 243 350'.split()}

df4=pd.DataFrame(data=dictgrp)
print(df4)
  Company   Person Sales
0    Goog      Sam   200
1    Goog  Charlie   130
2    msft      amy   340
3    msft  vanessa   124
4      fb     carl   243
5      fb    sarah   350

grpdf=pd.to_numeric(df4['Sales']).groupby(df4['Company'])
print(grpdf.mean())
    Company
Goog    165.0
fb      296.5
msft    232.0
Name: Sales, dtype: float64