pandas SpecificationError 的解决方案:同时 agg() 和 groupby() 不支持嵌套重命名器

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/60229375/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:25:28  来源:igfitidea点击:

Solution for SpecificationError: nested renamer is not supported while agg() along with groupby()

pythonpandasaggregate

提问by akshay jindal

def stack_plot(data, xtick, col2='project_is_approved', col3='total'):
    ind = np.arange(data.shape[0])

    plt.figure(figsize=(20,5))
    p1 = plt.bar(ind, data[col3].values)
    p2 = plt.bar(ind, data[col2].values)

    plt.ylabel('Projects')
    plt.title('Number of projects aproved vs rejected')
    plt.xticks(ind, list(data[xtick].values))
    plt.legend((p1[0], p2[0]), ('total', 'accepted'))
    plt.show()

def univariate_barplots(data, col1, col2='project_is_approved', top=False):
    # Count number of zeros in dataframe python: https://stackoverflow.com/a/51540521/4084039
    temp = pd.DataFrame(project_data.groupby(col1)[col2].agg(lambda x: x.eq(1).sum())).reset_index()

    # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039
    temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']

    temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']

    temp.sort_values(by=['total'],inplace=True, ascending=False)

    if top:
        temp = temp[0:top]

    stack_plot(temp, xtick=col1, col2=col2, col3='total')
    print(temp.head(5))
    print("="*50)
    print(temp.tail(5))

univariate_barplots(project_data, 'school_state', 'project_is_approved', False)

Error:

错误:

SpecificationError                        Traceback (most recent call last)
<ipython-input-21-2cace8f16608> in <module>()
----> 1 univariate_barplots(project_data, 'school_state', 'project_is_approved', False)

<ipython-input-20-856fcc83737b> in univariate_barplots(data, col1, col2, top)
      4 
      5     # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039
----> 6     temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']
      7     print (temp['total'].head(2))
      8     temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in aggregate(self, func, *args, **kwargs)
    251             # but not the class list / tuple itself.
    252             func = _maybe_mangle_lambdas(func)
--> 253             ret = self._aggregate_multiple_funcs(func)
    254             if relabeling:
    255                 ret.columns = columns

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in _aggregate_multiple_funcs(self, arg)
    292             # GH 15931
    293             if isinstance(self._selected_obj, Series):
--> 294                 raise SpecificationError("nested renamer is not supported")
    295 
    296             columns = list(arg.keys())

SpecificationError: **nested renamer is not supported**

回答by Kartikay Khanna

change

改变

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']

temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']

to

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(total='count')).reset_index()['total']
temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(Avg='mean')).reset_index()['Avg']

reason: in new pandas version named aggregation is the recommended replacement for the deprecated “dict-of-dicts” approach to naming the output of column-specific aggregations (Deprecate groupby.agg() with a dictionary when renaming).

原因:在新的 Pandas 版本中,命名聚合是已弃用的“dict-of-dicts”方法的推荐替代品,用于命名列特定聚合的输出(重命名时使用字典弃用 groupby.agg())。

source: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html

来源:https: //pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html

回答by tsorn

This error also happens if a column specified in the aggregation function dict does not exist in the dataframe:

如果数据框中不存在聚合函数 dict 中指定的列,也会发生此错误:

In [39]: pd.DataFrame([[1, 2]], columns=['A', 'B']).groupby('A').agg({'B': 'mean'})
Out[39]: 
   B
A   
1  2

In [40]: pd.DataFrame([[1, 2]], columns=['A', 'B']).groupby('A').agg({'B': 'mean', 'non-existing': 'mean'})
Out[40]:
...
SpecificationError: nested renamer is not supported

回答by Y K

I have got the similar issue as @akshay jindal, but I check the documentation as suggested by @artikay Khanna, the problem solved, some functions has been adjusted, the old is deprecated. Here is the code warning provided per last time execute.

我遇到了与@akshay jindal 类似的问题,但我按照@artikay Khanna 的建议查看了文档,问题已解决,部分功能已调整,旧功能已弃用。这是上次执行时提供的代码警告。

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: FutureWarning: using a dict on a Series for aggregation
is deprecated and will be removed in a future version. Use                 named aggregation instead.

    >>> grouper.agg(name_1=func_1, name_2=func_2)

  """Entry point for launching an IPython kernel.

Therefore, I will suggest try

因此,我建议尝试

grouper.agg(name_1=func_1, name_2=func_2)

Hope this will help

希望这会有所帮助

回答by kait

Do you get the same error if you change

如果你改变,你会得到同样的错误吗

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']

to

temp['total'] = project_data.groupby(col1)[col2].agg(total=('total','count')).reset_index()['total']

回答by Rishi

I have tried alll the solutions and turned out to be the error with the name. If your column name has some inbuilt keywords such as "in", "is",etc., It is throwing error. In my case, My column name is "Points in Polygon" and I have resolved the issue by renaming the column to "Points"

我已经尝试了所有解决方案,结果是名称错误。如果您的列名有一些内置关键字,例如“in”、“is”等,则会引发错误。就我而言,我的列名称是“多边形中的点”,我通过将列重命名为“点”解决了该问题