pandas SpecificationError 的解决方案：同时 agg() 和 groupby() 不支持嵌套重命名器

Question

提问by akshay jindal

def stack_plot(data, xtick, col2='project_is_approved', col3='total'):
    ind = np.arange(data.shape[0])

    plt.figure(figsize=(20,5))
    p1 = plt.bar(ind, data[col3].values)
    p2 = plt.bar(ind, data[col2].values)

    plt.ylabel('Projects')
    plt.title('Number of projects aproved vs rejected')
    plt.xticks(ind, list(data[xtick].values))
    plt.legend((p1[0], p2[0]), ('total', 'accepted'))
    plt.show()

def univariate_barplots(data, col1, col2='project_is_approved', top=False):
    # Count number of zeros in dataframe python: https://stackoverflow.com/a/51540521/4084039
    temp = pd.DataFrame(project_data.groupby(col1)[col2].agg(lambda x: x.eq(1).sum())).reset_index()

    # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039
    temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']

    temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']

    temp.sort_values(by=['total'],inplace=True, ascending=False)

    if top:
        temp = temp[0:top]

    stack_plot(temp, xtick=col1, col2=col2, col3='total')
    print(temp.head(5))
    print("="*50)
    print(temp.tail(5))

univariate_barplots(project_data, 'school_state', 'project_is_approved', False)

Error:

错误：

SpecificationError                        Traceback (most recent call last)
<ipython-input-21-2cace8f16608> in <module>()
----> 1 univariate_barplots(project_data, 'school_state', 'project_is_approved', False)

<ipython-input-20-856fcc83737b> in univariate_barplots(data, col1, col2, top)
      4 
      5     # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039
----> 6     temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']
      7     print (temp['total'].head(2))
      8     temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in aggregate(self, func, *args, **kwargs)
    251             # but not the class list / tuple itself.
    252             func = _maybe_mangle_lambdas(func)
--> 253             ret = self._aggregate_multiple_funcs(func)
    254             if relabeling:
    255                 ret.columns = columns

~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in _aggregate_multiple_funcs(self, arg)
    292             # GH 15931
    293             if isinstance(self._selected_obj, Series):
--> 294                 raise SpecificationError("nested renamer is not supported")
    295 
    296             columns = list(arg.keys())

SpecificationError: **nested renamer is not supported**

Answer 1

回答by Kartikay Khanna

change

改变

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']

temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']

to

到

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(total='count')).reset_index()['total']
temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(Avg='mean')).reset_index()['Avg']

reason: in new pandas version named aggregation is the recommended replacement for the deprecated “dict-of-dicts” approach to naming the output of column-specific aggregations (Deprecate groupby.agg() with a dictionary when renaming).

原因：在新的 Pandas 版本中，命名聚合是已弃用的“dict-of-dicts”方法的推荐替代品，用于命名列特定聚合的输出（重命名时使用字典弃用 groupby.agg()）。

source: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html

来源：https: //pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html

Answer 2

回答by tsorn

This error also happens if a column specified in the aggregation function dict does not exist in the dataframe:

如果数据框中不存在聚合函数 dict 中指定的列，也会发生此错误：

In [39]: pd.DataFrame([[1, 2]], columns=['A', 'B']).groupby('A').agg({'B': 'mean'})
Out[39]: 
   B
A   
1  2

In [40]: pd.DataFrame([[1, 2]], columns=['A', 'B']).groupby('A').agg({'B': 'mean', 'non-existing': 'mean'})
Out[40]:
...
SpecificationError: nested renamer is not supported

Answer 3

回答by Y K

I have got the similar issue as @akshay jindal, but I check the documentation as suggested by @artikay Khanna, the problem solved, some functions has been adjusted, the old is deprecated. Here is the code warning provided per last time execute.

我遇到了与@akshay jindal 类似的问题，但我按照@artikay Khanna 的建议查看了文档，问题已解决，部分功能已调整，旧功能已弃用。这是上次执行时提供的代码警告。

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: FutureWarning: using a dict on a Series for aggregation
is deprecated and will be removed in a future version. Use                 named aggregation instead.

    >>> grouper.agg(name_1=func_1, name_2=func_2)

  """Entry point for launching an IPython kernel.

Therefore, I will suggest try

因此，我建议尝试

grouper.agg(name_1=func_1, name_2=func_2)

Hope this will help

希望这会有所帮助

Answer 4

回答by kait

Do you get the same error if you change

如果你改变，你会得到同样的错误吗

temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']

to

到

temp['total'] = project_data.groupby(col1)[col2].agg(total=('total','count')).reset_index()['total']

Answer 5

回答by Rishi

I have tried alll the solutions and turned out to be the error with the name. If your column name has some inbuilt keywords such as "in", "is",etc., It is throwing error. In my case, My column name is "Points in Polygon" and I have resolved the issue by renaming the column to "Points"

我已经尝试了所有解决方案，结果是名称错误。如果您的列名有一些内置关键字，例如“in”、“is”等，则会引发错误。就我而言，我的列名称是“多边形中的点”，我通过将列重命名为“点”解决了该问题

pandas SpecificationError 的解决方案：同时 agg() 和 groupby() 不支持嵌套重命名器

提问by akshay jindal

回答by Kartikay Khanna

回答by tsorn

回答by Y K

回答by kait

回答by Rishi

相关推荐

最近更新

标签

pandas SpecificationError 的解决方案：同时 agg() 和 groupby() 不支持嵌套重命名器

提问by akshay jindal

回答by Kartikay Khanna

回答by tsorn

回答by Y K

回答by kait

回答by Rishi

相关推荐

pandas 熊猫样式标签给出“ValueError：非唯一索引不支持样式”

分别绘制所有 Pandas 数据框列

Pandas-ValueError：Usecols 与列不匹配，列需要但未找到

pandas XGBoost: AttributeError: 'DataFrame' 对象没有属性 'feature_names'

相关推荐

最近更新

标签