pandas groupby 和 mean 之后的“没有要聚合的数字类型”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48171492/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
"No numeric types to aggregate" after groupby and mean
提问by Harry
I'm dealing with time series and try to write function to calculation monthly average of data. Here are some function for prepare:
我正在处理时间序列并尝试编写函数来计算数据的月平均值。以下是一些准备功能:
import datetime
import numpy as numpy
def date_range_0(start,end):
dates = [start + datetime.timedelta(days=i)
for i in range((end-start).days+1)]
return numpy.array(dates)
def date_range_1(start,days):
#days should be an interger
return date_range_0(start,start+datetime.timedelta(days-1))
x=date_range_1(datetime.datetime(2015, 5, 17),4)
x, the output is a simple time list:
x,输出是一个简单的时间列表:
array([datetime.datetime(2015, 5, 17, 0, 0),
datetime.datetime(2015, 5, 18, 0, 0),
datetime.datetime(2015, 5, 19, 0, 0),
datetime.datetime(2015, 5, 20, 0, 0)], dtype=object)
Then I learn groupby function from http://blog.csdn.net/youngbit007/article/details/54288603I have tried one example in website above and it works fine:
然后我从http://blog.csdn.net/youngbit007/article/details/54288603学习 groupby 函数 我在上面的网站上尝试了一个例子,它工作正常:
df = pandas.DataFrame({'key1':date_range_1(datetime.datetime(2015, 1, 17),5),
'key2': [2015001,2015001,2015001,2015001,2015001],
'data1': 1+0.1*numpy.arange(1,6)
})
df
gives
给
data1 key1 key2
0 1.1 2015-01-17 2015001
1 1.2 2015-01-18 2015001
2 1.3 2015-01-19 2015001
3 1.4 2015-01-20 2015001
4 1.5 2015-01-21 2015001
and
和
grouped=df['data1'].groupby(df['key2'])
grouped.mean()
gives
给
key2
2015001 0.2
Name: data1, dtype: float64
Then I try my own example:
然后我尝试我自己的例子:
datedat=numpy.array([date_range_1(datetime.datetime(2015, 1, 17),5),1+0.1*numpy.arange(1,6)]).T
months = [day.month for day in datedat[:,0]]
years = [day.year for day in datedat[:,0]]
datedatF =
pandas.DataFrame({'key1':datedat[:,0],'key2':list((numpy.array(years)*1000 +numpy.array(months))),'data1':datedat[:,1]})
datedatF
which generated
其中产生
data1 key1 key2
0 1.1 2015-01-17 2015001
1 1.2 2015-01-18 2015001
2 1.3 2015-01-19 2015001
3 1.4 2015-01-20 2015001
4 1.5 2015-01-21 2015001
Note this is exactly the very same table as above! so far so good. Then I run:
请注意,这与上面的表格完全相同!到目前为止,一切都很好。然后我运行:
grouped2=datedatF['data1'].groupby(datedatF['key2'])
grouped2.mean()
it throw out this:
它抛出这个:
---------------------------------------------------------------------------
DataError Traceback (most recent call last)
<ipython-input-170-f0d2bc225b88> in <module>()
1 grouped2=datedatF['data1'].groupby(datedatF['key2'])
----> 2 grouped2.mean()
/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in mean(self, *args, **kwargs)
1017 nv.validate_groupby_func('mean', args, kwargs)
1018 try:
-> 1019 return self._cython_agg_general('mean')
1020 except GroupByError:
1021 raise
/root/anaconda3/lib/python3.6/site-packages/pandas/core/groupby.py in _cython_agg_general(self, how, numeric_only)
806
807 if len(output) == 0:
--> 808 raise DataError('No numeric types to aggregate')
809
810 return self._wrap_aggregated_output(output, names)
DataError: No numeric types to aggregate
ohh..what did I wrong?Why can't I mean the second pandas.DataFrame? It's completely same as the successful example!
哦..我做错了什么?为什么我的意思不是第二个pandas.DataFrame?和成功的例子完全一样!
回答by YOBEN_S
You data1 type in your df is object , we need adding pd.to_numeric
您在 df 中输入的 data1 是 object ,我们需要添加 pd.to_numeric
datedatF.dtypes
Out[39]:
data1 object
key1 datetime64[ns]
key2 int64
dtype: object
grouped2=pd.to_numeric(datedatF['data1']).groupby(datedatF['key2'])
grouped2.mean()
Out[41]:
key2
2015001 1.3
Name: data1, dtype: float64
回答by MaxU
your data1
is of object
(string) dtype:
你data1
是object
(字符串)dtype:
In [396]: datedatF.dtypes
Out[396]:
data1 object # <--- NOTE!
key1 datetime64[ns]
key2 int64
dtype: object
so try this:
所以试试这个:
In [397]: datedatF.assign(data1=pd.to_numeric(datedatF['data1'], errors='coerce')) \
.groupby('key2')['data1'].mean()
Out[397]:
key2
2015001 1.3
Name: data1, dtype: float64
回答by sameer_nubia
Group by
通过...分组
dictgrp={'Company':'Goog Goog msft msft fb fb'.split(),
'Person':'Sam Charlie amy vanessa carl sarah'.split(),
'Sales':'200 130 340 124 243 350'.split()}
df4=pd.DataFrame(data=dictgrp)
print(df4)
Company Person Sales
0 Goog Sam 200
1 Goog Charlie 130
2 msft amy 340
3 msft vanessa 124
4 fb carl 243
5 fb sarah 350
grpdf=pd.to_numeric(df4['Sales']).groupby(df4['Company'])
print(grpdf.mean())
Company
Goog 165.0
fb 296.5
msft 232.0
Name: Sales, dtype: float64