在python shell中导入pyspark
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23256536/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
importing pyspark in python shell
提问by Glenn Strycker
This is a copy of someone else's question on another forum that was never answered, so I thought I'd re-ask it here, as I have the same issue. (See http://geekple.com/blogs/feeds/Xgzu7/posts/351703064084736)
这是别人在另一个论坛上的问题的副本,但从未有人回答过,所以我想我会在这里重新提问,因为我有同样的问题。(见http://geekple.com/blogs/feeds/Xgzu7/posts/351703064084736)
I have Spark installed properly on my machine and am able to run python programs with the pyspark modules without error when using ./bin/pyspark as my python interpreter.
我在我的机器上正确安装了 Spark,并且在使用 ./bin/pyspark 作为我的 python 解释器时,能够使用 pyspark 模块运行 python 程序而不会出错。
However, when I attempt to run the regular Python shell, when I try to import pyspark modules I get this error:
但是,当我尝试运行常规 Python shell 时,当我尝试导入 pyspark 模块时,出现此错误:
from pyspark import SparkContext
and it says
它说
"No module named pyspark".
How can I fix this? Is there an environment variable I need to set to point Python to the pyspark headers/libraries/etc.? If my spark installation is /spark/, which pyspark paths do I need to include? Or can pyspark programs only be run from the pyspark interpreter?
我怎样才能解决这个问题?是否需要设置一个环境变量来将 Python 指向 pyspark 头文件/库/等?如果我的 spark 安装是 /spark/,我需要包含哪些 pyspark 路径?或者 pyspark 程序只能从 pyspark 解释器运行?
采纳答案by Glenn Strycker
Turns out that the pyspark bin is LOADING python and automatically loading the correct library paths. Check out $SPARK_HOME/bin/pyspark :
原来 pyspark bin 正在加载 python 并自动加载正确的库路径。查看 $SPARK_HOME/bin/pyspark :
# Add the PySpark classes to the Python path:
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
I added this line to my .bashrc file and the modules are now correctly found!
我将此行添加到我的 .bashrc 文件中,现在可以正确找到模块!
回答by Peng Zhang 1516540
If it prints such error:
如果它打印这样的错误:
ImportError: No module named py4j.java_gateway
导入错误:没有名为 py4j.java_gateway 的模块
Please add $SPARK_HOME/python/build to PYTHONPATH:
请将 $SPARK_HOME/python/build 添加到 PYTHONPATH:
export SPARK_HOME=/Users/pzhang/apps/spark-1.1.0-bin-hadoop2.4
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
回答by jyu
On Mac, I use Homebrew to install Spark (formula "apache-spark"). Then, I set the PYTHONPATH this way so the Python import works:
在 Mac 上,我使用 Homebrew 安装 Spark(公式“apache-spark”)。然后,我以这种方式设置 PYTHONPATH,以便 Python 导入工作:
export SPARK_HOME=/usr/local/Cellar/apache-spark/1.2.0
export PYTHONPATH=$SPARK_HOME/libexec/python:$SPARK_HOME/libexec/python/build:$PYTHONPATH
Replace the "1.2.0" with the actual apache-spark version on your mac.
用 Mac 上的实际 apache-spark 版本替换“1.2.0”。
回答by dodo
dont run your py file as: python filename.py
instead use: spark-submit filename.py
不要将您的 py 文件运行为:python filename.py
而是使用:spark-submit filename.py
回答by Dawny33
By exporting the SPARK path and the Py4j path, it started to work:
通过导出SPARK路径和Py4j路径,开始工作了:
export SPARK_HOME=/usr/local/Cellar/apache-spark/1.5.1
export PYTHONPATH=$SPARK_HOME/libexec/python:$SPARK_HOME/libexec/python/build:$PYTHONPATH
PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
So, if you don't want to type these everytime you want to fire up the Python shell, you might want to add it to your .bashrc
file
所以,如果你不想在每次启动 Python shell 时都输入这些,你可能想把它添加到你的.bashrc
文件中
回答by Suresh2692
Here is a simple method (If you don't bother about how it works!!!)
这是一个简单的方法(如果你不关心它是如何工作的!!!)
Use findspark
Go to your python shell
pip install findspark import findspark findspark.init()
import the necessary modules
from pyspark import SparkContext from pyspark import SparkConf
Done!!!
转到你的python shell
pip install findspark import findspark findspark.init()
导入必要的模块
from pyspark import SparkContext from pyspark import SparkConf
完毕!!!
回答by Patrick
I got this error because the python script I was trying to submit was called pyspark.py (facepalm). The fix was to set my PYTHONPATH as recommended above, then rename the script to pyspark_test.py and clean up the pyspark.pyc that was created based on my scripts original name and that cleared this error up.
我收到此错误是因为我尝试提交的 python 脚本名为 pyspark.py ( facepalm)。修复方法是按照上面的建议设置我的 PYTHONPATH,然后将脚本重命名为 pyspark_test.py 并清理基于我的脚本原始名称创建的 pyspark.pyc 并清除了此错误。
回答by Sreesankar
In the case of DSE (DataStax Cassandra & Spark) The following location needs to be added to PYTHONPATH
在DSE(DataStax Cassandra & Spark)的情况下需要在PYTHONPATH中添加以下位置
export PYTHONPATH=/usr/share/dse/resources/spark/python:$PYTHONPATH
Then use the dse pyspark to get the modules in path.
然后使用 dse pyspark 获取路径中的模块。
dse pyspark
回答by tjb305
I had this same problem and would add one thing to the proposed solutions above. When using Homebrew on Mac OS X to install Spark you will need to correct the py4j path address to include libexec in the path (remembering to change py4j version to the one you have);
我遇到了同样的问题,并会在上面提出的解决方案中添加一件事。在 Mac OS X 上使用 Homebrew 安装 Spark 时,您需要更正 py4j 路径地址以在路径中包含 libexec(记住将 py4j 版本更改为您拥有的版本);
PYTHONPATH=$SPARK_HOME/libexec/python/lib/py4j-0.9-src.zip:$PYTHONPATH
回答by Karang
To get rid of ImportError: No module named py4j.java_gateway
, you need to add following lines:
要摆脱ImportError: No module named py4j.java_gateway
,您需要添加以下几行:
import os
import sys
os.environ['SPARK_HOME'] = "D:\python\spark-1.4.1-bin-hadoop2.4"
sys.path.append("D:\python\spark-1.4.1-bin-hadoop2.4\python")
sys.path.append("D:\python\spark-1.4.1-bin-hadoop2.4\python\lib\py4j-0.8.2.1-src.zip")
try:
from pyspark import SparkContext
from pyspark import SparkConf
print ("success")
except ImportError as e:
print ("error importing spark modules", e)
sys.exit(1)