Python AttributeError: 'SparkContext' 对象没有属性 'createDataFrame' 使用 Spark 1.6
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/40651003/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
AttributeError: 'SparkContext' object has no attribute 'createDataFrame' using Spark 1.6
提问by pr338
Previous questions asking about this error have answers saying all you need to do is update your version of Spark. I just deleted my earlier version of Spark and installed Spark 1.6.3 built for Hadoop 2.6.0.
以前询问此错误的问题的答案是说您需要做的就是更新您的 Spark 版本。我刚刚删除了早期版本的 Spark 并安装了为 Hadoop 2.6.0 构建的 Spark 1.6.3。
I tried this:
我试过这个:
s_df = sc.createDataFrame(pandas_df)
And got this error:
并得到这个错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-8-4e8b3fc80a02> in <module>()
1 #creating a spark dataframe from the pandas dataframe
----> 2 s_df = sc.createDataFrame(pandas_df)
AttributeError: 'SparkContext' object has no attribute 'createDataFrame'
Does anyone know why? I tried deleting and reinstalling the same 1.6 version but it didn't work for me.
有谁知道为什么?我尝试删除并重新安装相同的 1.6 版本,但它对我不起作用。
Here are my environment variables that I was messing with to get my pyspark to work properly:
这是我为了让pyspark正常工作而弄乱的环境变量:
PATH="/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin"
export PATH
# Setting PATH for Python 2.7
# The orginal version is saved in .bash_profile.pysave
PATH="/Library/Frameworks/Python.framework/Versions/2.7/bin:${PATH}"
export PATH
# added by Anaconda installer
export PATH="/Users/pr/anaconda:$PATH"
# path to JAVA_HOME
export JAVA_HOME=$(/usr/libexec/java_home)
#Spark
export SPARK_HOME="/Users/pr/spark" #version 1.6
export PATH=$PATH:$SPARK_HOME/bin
export PYSPARK_SUBMIT_ARGS="--master local[2]"
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
Did I maybe need to install Hadoop separately? I skipped that step because I didn't need it for the code I was running.
我可能需要单独安装 Hadoop 吗?我跳过了这一步,因为我运行的代码不需要它。
回答by
SparkContext
doesn't have, SQLContext
has:
SparkContext
没有,SQLContext
有:
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
sqlContext.createDataFrame(pandas_df)