pandas SpecificationError 的解决方案:同时 agg() 和 groupby() 不支持嵌套重命名器
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/60229375/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Solution for SpecificationError: nested renamer is not supported while agg() along with groupby()
提问by akshay jindal
def stack_plot(data, xtick, col2='project_is_approved', col3='total'):
ind = np.arange(data.shape[0])
plt.figure(figsize=(20,5))
p1 = plt.bar(ind, data[col3].values)
p2 = plt.bar(ind, data[col2].values)
plt.ylabel('Projects')
plt.title('Number of projects aproved vs rejected')
plt.xticks(ind, list(data[xtick].values))
plt.legend((p1[0], p2[0]), ('total', 'accepted'))
plt.show()
def univariate_barplots(data, col1, col2='project_is_approved', top=False):
# Count number of zeros in dataframe python: https://stackoverflow.com/a/51540521/4084039
temp = pd.DataFrame(project_data.groupby(col1)[col2].agg(lambda x: x.eq(1).sum())).reset_index()
# Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039
temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']
temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']
temp.sort_values(by=['total'],inplace=True, ascending=False)
if top:
temp = temp[0:top]
stack_plot(temp, xtick=col1, col2=col2, col3='total')
print(temp.head(5))
print("="*50)
print(temp.tail(5))
univariate_barplots(project_data, 'school_state', 'project_is_approved', False)
Error:
错误:
SpecificationError Traceback (most recent call last)
<ipython-input-21-2cace8f16608> in <module>()
----> 1 univariate_barplots(project_data, 'school_state', 'project_is_approved', False)
<ipython-input-20-856fcc83737b> in univariate_barplots(data, col1, col2, top)
4
5 # Pandas dataframe grouby count: https://stackoverflow.com/a/19385591/4084039
----> 6 temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']
7 print (temp['total'].head(2))
8 temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']
~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in aggregate(self, func, *args, **kwargs)
251 # but not the class list / tuple itself.
252 func = _maybe_mangle_lambdas(func)
--> 253 ret = self._aggregate_multiple_funcs(func)
254 if relabeling:
255 ret.columns = columns
~\AppData\Roaming\Python\Python36\site-packages\pandas\core\groupby\generic.py in _aggregate_multiple_funcs(self, arg)
292 # GH 15931
293 if isinstance(self._selected_obj, Series):
--> 294 raise SpecificationError("nested renamer is not supported")
295
296 columns = list(arg.keys())
SpecificationError: **nested renamer is not supported**
回答by Kartikay Khanna
change
改变
temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']
temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'Avg':'mean'})).reset_index()['Avg']
to
到
temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(total='count')).reset_index()['total']
temp['Avg'] = pd.DataFrame(project_data.groupby(col1)[col2].agg(Avg='mean')).reset_index()['Avg']
reason: in new pandas version named aggregation is the recommended replacement for the deprecated “dict-of-dicts” approach to naming the output of column-specific aggregations (Deprecate groupby.agg() with a dictionary when renaming).
原因:在新的 Pandas 版本中,命名聚合是已弃用的“dict-of-dicts”方法的推荐替代品,用于命名列特定聚合的输出(重命名时使用字典弃用 groupby.agg())。
source: https://pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html
来源:https: //pandas.pydata.org/pandas-docs/stable/whatsnew/v0.25.0.html
回答by tsorn
This error also happens if a column specified in the aggregation function dict does not exist in the dataframe:
如果数据框中不存在聚合函数 dict 中指定的列,也会发生此错误:
In [39]: pd.DataFrame([[1, 2]], columns=['A', 'B']).groupby('A').agg({'B': 'mean'})
Out[39]:
B
A
1 2
In [40]: pd.DataFrame([[1, 2]], columns=['A', 'B']).groupby('A').agg({'B': 'mean', 'non-existing': 'mean'})
Out[40]:
...
SpecificationError: nested renamer is not supported
回答by Y K
I have got the similar issue as @akshay jindal, but I check the documentation as suggested by @artikay Khanna, the problem solved, some functions has been adjusted, the old is deprecated. Here is the code warning provided per last time execute.
我遇到了与@akshay jindal 类似的问题,但我按照@artikay Khanna 的建议查看了文档,问题已解决,部分功能已调整,旧功能已弃用。这是上次执行时提供的代码警告。
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:1: FutureWarning: using a dict on a Series for aggregation
is deprecated and will be removed in a future version. Use named aggregation instead.
>>> grouper.agg(name_1=func_1, name_2=func_2)
"""Entry point for launching an IPython kernel.
Therefore, I will suggest try
因此,我建议尝试
grouper.agg(name_1=func_1, name_2=func_2)
Hope this will help
希望这会有所帮助
回答by kait
Do you get the same error if you change
如果你改变,你会得到同样的错误吗
temp['total'] = pd.DataFrame(project_data.groupby(col1)[col2].agg({'total':'count'})).reset_index()['total']
to
到
temp['total'] = project_data.groupby(col1)[col2].agg(total=('total','count')).reset_index()['total']
回答by Rishi
I have tried alll the solutions and turned out to be the error with the name. If your column name has some inbuilt keywords such as "in", "is",etc., It is throwing error. In my case, My column name is "Points in Polygon" and I have resolved the issue by renaming the column to "Points"
我已经尝试了所有解决方案,结果是名称错误。如果您的列名有一些内置关键字,例如“in”、“is”等,则会引发错误。就我而言,我的列名称是“多边形中的点”,我通过将列重命名为“点”解决了该问题