使用 GroupBy 获取 Pandas 的平均值 - 获取数据错误：没有要聚合的数字类型 -

Question

提问by Max Song

I know that there are numerous questions about this, like Getting daily averages with pandasand How get monthly mean in pandas using groupbybut I'm getting a weird error.

我知道有很多关于这个的问题，比如使用Pandas获取每日平均值和如何使用 groupby 在Pandas中获取每月平均值，但我遇到了一个奇怪的错误。

Simple data set, with one index column (type timestamp) and one value column. Would like to get the monthly mean of the data.

简单的数据集，有一个索引列（类型时间戳）和一个值列。想获得数据的月平均值。

In [76]: df.head()
Out[76]: 
                          A
2008-01-02                1
2008-01-03                2
2008-01-04                3
2008-01-07                4
2008-01-08                5

However, when I groupby, I get just the groups of the index and not of the value

但是，当我分组时，我只得到索引的组而不是值的组

In [74]: df.head().groupby(lambda x: x.month).groups
Out[74]: 
{1: [Timestamp('2008-01-02 00:00:00'),
  Timestamp('2008-01-03 00:00:00'),
  Timestamp('2008-01-04 00:00:00'),
  Timestamp('2008-01-07 00:00:00'),
  Timestamp('2008-01-08 00:00:00')]}

Attempts to take means() result in an error:

尝试使用means()会导致错误：

Have tried both df.head().resample("M", how='mean')and df.head().groupby(lambda x: x.month).mean()

都试过df.head().resample("M", how='mean')和df.head().groupby(lambda x: x.month).mean()

and gets the error: DataError: No numeric types to aggregate

并得到错误： DataError: No numeric types to aggregate

In [75]: df.resample("M", how='mean')
---------------------------------------------------------------------------
DataError                                 Traceback (most recent call last)
<ipython-input-75-79dc1a060ba4> in <module>()
----> 1 df.resample("M", how='mean')

/usr/local/lib/python2.7/site-packages/pandas/core/generic.pyc in resample(self, rule, how, axis, fill_method, closed, label, convention, kind, loffset, limit, base)
   2878                               fill_method=fill_method, convention=convention,
   2879                               limit=limit, base=base)
-> 2880         return sampler.resample(self).__finalize__(self)
   2881 
   2882     def first(self, offset):

/usr/local/lib/python2.7/site-packages/pandas/tseries/resample.pyc in resample(self, obj)
     82 
     83         if isinstance(ax, DatetimeIndex):
---> 84             rs = self._resample_timestamps()
     85         elif isinstance(ax, PeriodIndex):
     86             offset = to_offset(self.freq)

/usr/local/lib/python2.7/site-packages/pandas/tseries/resample.pyc in _resample_timestamps(self)
    286             # Irregular data, have to use groupby
    287             grouped = obj.groupby(grouper, axis=self.axis)
--> 288             result = grouped.aggregate(self._agg_method)
    289 
    290             if self.fill_method is not None:

/usr/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in aggregate(self, arg, *args, **kwargs)
   2436     def aggregate(self, arg, *args, **kwargs):
   2437         if isinstance(arg, compat.string_types):
-> 2438             return getattr(self, arg)(*args, **kwargs)
   2439 
   2440         result = OrderedDict()

/usr/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in mean(self)
    664         """
    665         try:
--> 666             return self._cython_agg_general('mean')
    667         except GroupByError:
    668             raise

/usr/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_general(self, how, numeric_only)
   2356 
   2357     def _cython_agg_general(self, how, numeric_only=True):
-> 2358         new_items, new_blocks = self._cython_agg_blocks(how, numeric_only=numeric_only)
   2359         return self._wrap_agged_blocks(new_items, new_blocks)
   2360 

/usr/local/lib/python2.7/site-packages/pandas/core/groupby.pyc in _cython_agg_blocks(self, how, numeric_only)
   2406 
   2407         if len(new_blocks) == 0:
-> 2408             raise DataError('No numeric types to aggregate')
   2409 
   2410         return data.items, new_blocks

DataError: No numeric types to aggregate

Answer 1

回答by FooBar

Yeah, you should try coercing Ato numeric with something like df['A'] = df['A'].astype(int). Might be worth checking if there's anything in the initial data read-in that caused it to be object instead of numeric as well.

是的，您应该尝试A使用类似df['A'] = df['A'].astype(int). 可能值得检查初始数据读入中是否有任何内容导致它也是对象而不是数字。

使用 GroupBy 获取 Pandas 的平均值 - 获取数据错误：没有要聚合的数字类型 -

提问by Max Song

回答by FooBar

相关推荐

最近更新

标签

使用 GroupBy 获取 Pandas 的平均值 - 获取数据错误：没有要聚合的数字类型 -

提问by Max Song

回答by FooBar

相关推荐

pandas 带有 to_sql() 、SQLAlchemy 和 exasol 模式的 python 熊猫

Pandas 中的分层多索引计数

pandas 匹配列名时出现值错误

Python Pandas：从多级列索引中删除一列？

相关推荐

最近更新

标签