pandas 如何使用 matplotlib 绘制 pyspark sql 结果

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45003301/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:57:40  来源:igfitidea点击:

How to use matplotlib to plot pyspark sql results

pythonpandasmatplotlibpyspark-sql

提问by HasanDange

I am new to pyspark. I want to plot the result using matplotlib, but not sure which function to use. I searched for a way to convert sql result to pandas and then use plot.

我是 pyspark 的新手。我想使用 matplotlib 绘制结果,但不确定要使用哪个函数。我搜索了一种将sql结果转换为pandas然后使用plot的方法。

采纳答案by HasanDange

Hi Team I have found the solution for this. I converted sql dataframe to pandas dataframe and then I was able to plot the graphs. below is the sample code.from

嗨团队我已经找到了解决方案。我将 sql 数据框转换为 Pandas 数据框,然后我就可以绘制图形了。下面是示例代码。来自

pyspark.sql import Row
from pyspark.sql import HiveContext
import pyspark
from IPython.display import display
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline 
sc = pyspark.SparkContext()
sqlContext = HiveContext(sc)
test_list = [(1, 'hasan'),(2, 'nana'),(3, 'dad'),(4, 'mon')]
rdd = sc.parallelize(test_list)
people = rdd.map(lambda x: Row(id=int(x[0]), name=x[1]))
schemaPeople = sqlContext.createDataFrame(people)
# Register it as a temp table
sqlContext.registerDataFrameAsTable(schemaPeople, "test_table")
df1=sqlContext.sql("Select * from test_table")
pdf1=df1.toPandas()
pdf1.plot(kind='barh',x='name',y='id',colormap='winter_r')