pandas DataFrameGroupBy 对象的计算模式时出错

Question

提问by user2314737

I have a dataframe with a Datecolumn, I group the data by year and I can compute mean and median. But how to compute the mode? Here is the error I get:

我有一个带Date列的数据框，我按年份对数据进行分组，并且可以计算平均值和中位数。但是如何计算模式呢？这是我得到的错误：

>>> np.random.seed(0)
>>> rng = pd.date_range('2010-01-01', periods=10, freq='2M')
>>> df = pd.DataFrame({ 'Date': rng, 'Val': np.random.random_integers(0,100,size=10) })
>>> df
        Date  Val
0 2010-01-31   44
1 2010-03-31   47
2 2010-05-31   64
3 2010-07-31   67
4 2010-09-30   67
5 2010-11-30    9
6 2011-01-31   83
7 2011-03-31   21
8 2011-05-31   36
9 2011-07-31   87
>>> df.groupby(pd.Grouper(key='Date',freq='A')).mean()
                  Val
Date                 
2010-12-31  49.666667
2011-12-31  56.750000
>>> df.groupby(pd.Grouper(key='Date',freq='A')).median()
             Val
Date            
2010-12-31  55.5
2011-12-31  59.5
>>> df.groupby(pd.Grouper(key='Date',freq='A')).mode()

Traceback (most recent call last):
  File "<pyshell#109>", line 1, in <module>
    df.groupby(pd.Grouper(key='Date',freq='A')).mode()
  File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 554, in __getattr__
    return self._make_wrapper(attr)
  File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 571, in _make_wrapper
    raise AttributeError(msg)
AttributeError: Cannot access callable attribute 'mode' of 'DataFrameGroupBy' objects, try using the 'apply' method

Answer 1

采纳答案by piRSquared

use np.uniquewith the return_countsparameter.
use the argmaxon the counts array to get value from unique array.
use np.apply_along_axisfor a custom function mode

np.unique与return_counts参数一起使用。
使用argmaxcounts 数组从唯一数组中获取值。
使用np.apply_along_axis自定义功能mode

def mode(a):
    u, c = np.unique(a, return_counts=True)
    return u[c.argmax()]

df.groupby(pd.Grouper(key='Date',freq='A')).Val.apply(mode)

Date
2010-12-31    67
2011-12-31    21
Freq: A-DEC, Name: Val, dtype: int64

Answer 2

回答by 3novak

modeisn't a built in function that's automatically compatible with pandas groupby objects. You could use the scipy.statsmodule. This feels a little clunky, though.

mode不是自动与 Pandas groupby 对象兼容的内置函数。您可以使用该scipy.stats模块。不过，这感觉有点笨拙。

from scipy import stats

df.groupby(pd.Grouper(key='Date',freq='A')).apply(stats.mode)

Alternatively, you could use the value_counts()function and take the first index value returned. This is the route I would go.

或者，您可以使用该value_counts()函数并获取返回的第一个索引值。这是我要走的路线。

df.groupby(pd.Grouper(key='Date', freq='A')).value_counts()[0].index.values[0]

Answer 3

回答by Vaid?tas Iv??ka

modeis problematic as others have mentioned, however a DataFrameGroupby object can be applied a trivial lambda function, just as the AttributeError suggests using (and contains no ugly slicing or anything else):

模式是有问题的，正如其他人所提到的，但是 DataFrameGroupby 对象可以应用一个简单的 lambda 函数，就像 AttributeError 建议使用的那样（并且不包含丑陋的切片或其他任何东西）：

df.groupby(grouping_column)[[i for i in pivotable_columns]].apply(lambda x: x.mode())]

pandas DataFrameGroupBy 对象的计算模式时出错

提问by user2314737

采纳答案by piRSquared

回答by 3novak

回答by Vaid?tas Iv??ka

相关推荐

最近更新

标签

pandas DataFrameGroupBy 对象的计算模式时出错

提问by user2314737

采纳答案by piRSquared

回答by 3novak

回答by Vaid?tas Iv??ka

相关推荐

附加到 python/pandas 中的系列不起作用

pandas 导出到 CSV 时，如何在列中保留前导零？

pandas Python将Cassandra数据读入pandas

pandas 计算字符串中的字符数，从中创建一个数据框列？

相关推荐

最近更新

标签