pandas 'GroupedData' 对象在 Spark 数据帧中执行数据透视时没有属性 'show'

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51820994/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:56:06  来源:igfitidea点击:

'GroupedData' object has no attribute 'show' when doing doing pivot in spark dataframe

pythonpandasapache-sparkdataframepyspark

提问by Nabih Bawazir

I want to pivot a spark dataframe, I refer pyspark documentation, and based on pivotfunction, the clue is .groupBy('name').pivot('name', values=None). Here's my dataset,

我想旋转一个 spark 数据,我参考 pyspark 文档,并根据pivot功能,线索是.groupBy('name').pivot('name', values=None). 这是我的数据集,

 In[75]:  spDF.show()
 Out[75]:

+-----------+-----------+
|customer_id|       name|
+-----------+-----------+
|      25620| MCDonnalds|
|      25620|  STARBUCKS|
|      25620|        nan|
|      25620|        nan|
|      25620| MCDonnalds|
|      25620|        nan|
|      25620| MCDonnalds|
|      25620|DUNKINDONUT|
|      25620|   LOTTERIA|
|      25620|        nan|
|      25620| MCDonnalds|
|      25620|DUNKINDONUT|
|      25620|DUNKINDONUT|
|      25620|        nan|
|      25620|        nan|
|      25620|        nan|
|      25620|        nan|
|      25620|   LOTTERIA|
|      25620|   LOTTERIA|
|      25620|  STARBUCKS|
+-----------+-----------+
only showing top 20 rows

And then I try to di pivot the table name

然后我尝试旋转表名

In [96]:
spDF.groupBy('name').pivot('name', values=None)
Out[96]:
<pyspark.sql.group.GroupedData at 0x7f0ad03750f0>

And when I try to show them

当我试图向他们展示

In [98]:
spDF.groupBy('name').pivot('name', values=None).show()
Out [98]:

    ---------------------------------------------------------------------------
AttributeError                       Traceback (most recent call last)
<ipython-input-98-94354082e956> in <module>()
----> 1 spDF.groupBy('name').pivot('name', values=None).show()
AttributeError: 'GroupedData' object has no attribute 'show'

I don't know why 'GroupedData'can't be shown, what should I do to solve the issue?

我不知道为什么'GroupedData'不能显示,我该怎么做才能解决这个问题?

回答by chromaerror

The pivot()method returns a GroupedDataobject, just like groupBy(). You cannot use show()on a GroupedDataobject without using an aggregate function (such as sum()or even count()) on it before.

pivot()方法返回一个GroupedData对象,就像groupBy(). 如果之前没有show()GroupedData对象上使用聚合函数(例如sum()count()),则不能在该对象上使用。

See this articlefor more information

请参阅这篇文章了解更多信息