pandas DataFrameGroupBy 对象的计算模式时出错
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41430896/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Error when computing mode of a DataFrameGroupBy object
提问by user2314737
I have a dataframe with a Date
column, I group the data by year and I can compute mean and median. But how to compute the mode? Here is the error I get:
我有一个带Date
列的数据框,我按年份对数据进行分组,并且可以计算平均值和中位数。但是如何计算模式呢?这是我得到的错误:
>>> np.random.seed(0)
>>> rng = pd.date_range('2010-01-01', periods=10, freq='2M')
>>> df = pd.DataFrame({ 'Date': rng, 'Val': np.random.random_integers(0,100,size=10) })
>>> df
Date Val
0 2010-01-31 44
1 2010-03-31 47
2 2010-05-31 64
3 2010-07-31 67
4 2010-09-30 67
5 2010-11-30 9
6 2011-01-31 83
7 2011-03-31 21
8 2011-05-31 36
9 2011-07-31 87
>>> df.groupby(pd.Grouper(key='Date',freq='A')).mean()
Val
Date
2010-12-31 49.666667
2011-12-31 56.750000
>>> df.groupby(pd.Grouper(key='Date',freq='A')).median()
Val
Date
2010-12-31 55.5
2011-12-31 59.5
>>> df.groupby(pd.Grouper(key='Date',freq='A')).mode()
Traceback (most recent call last):
File "<pyshell#109>", line 1, in <module>
df.groupby(pd.Grouper(key='Date',freq='A')).mode()
File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 554, in __getattr__
return self._make_wrapper(attr)
File "C:\Python27\lib\site-packages\pandas\core\groupby.py", line 571, in _make_wrapper
raise AttributeError(msg)
AttributeError: Cannot access callable attribute 'mode' of 'DataFrameGroupBy' objects, try using the 'apply' method
采纳答案by piRSquared
- use
np.unique
with thereturn_counts
parameter. - use the
argmax
on the counts array to get value from unique array. - use
np.apply_along_axis
for a custom functionmode
np.unique
与return_counts
参数一起使用。- 使用
argmax
counts 数组从唯一数组中获取值。 - 使用
np.apply_along_axis
自定义功能mode
def mode(a):
u, c = np.unique(a, return_counts=True)
return u[c.argmax()]
df.groupby(pd.Grouper(key='Date',freq='A')).Val.apply(mode)
Date
2010-12-31 67
2011-12-31 21
Freq: A-DEC, Name: Val, dtype: int64
回答by 3novak
mode
isn't a built in function that's automatically compatible with pandas groupby objects. You could use the scipy.stats
module. This feels a little clunky, though.
mode
不是自动与 Pandas groupby 对象兼容的内置函数。您可以使用该scipy.stats
模块。不过,这感觉有点笨拙。
from scipy import stats
df.groupby(pd.Grouper(key='Date',freq='A')).apply(stats.mode)
Alternatively, you could use the value_counts()
function and take the first index value returned. This is the route I would go.
或者,您可以使用该value_counts()
函数并获取返回的第一个索引值。这是我要走的路线。
df.groupby(pd.Grouper(key='Date', freq='A')).value_counts()[0].index.values[0]
回答by Vaid?tas Iv??ka
modeis problematic as others have mentioned, however a DataFrameGroupby object can be applied a trivial lambda function, just as the AttributeError suggests using (and contains no ugly slicing or anything else):
模式是有问题的,正如其他人所提到的,但是 DataFrameGroupby 对象可以应用一个简单的 lambda 函数,就像 AttributeError 建议使用的那样(并且不包含丑陋的切片或其他任何东西):
df.groupby(grouping_column)[[i for i in pivotable_columns]].apply(lambda x: x.mode())]