Python 没有模块名称pyspark错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34302314/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
No module name pyspark error
提问by BetterEveryDay
This is the exact code from a tutorial I'm following. My classmate didn't get this error with the same code:
这是我正在关注的教程中的确切代码。我的同学用同样的代码没有得到这个错误:
ImportError Traceback (most recent call last)
<ipython-input-1-c6e1bed850ab> in <module>()
----> 1 from pyspark import SparkContext
2 sc = SparkContext('local', 'Exam_3')
3
4 from pyspark.sql import SQLContext
5 sqlContext = SQLContext(sc)
ImportError: No module named pyspark
This is the code:
这是代码:
from pyspark import SparkContext
sc = SparkContext('local', 'Exam_3')
from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)
data = sc.textFile("exam3")
parsedData = data.map(lambda line: [float(x) for x in line.split(',')])
retail = sqlContext.createDataFrame(parsedData,
['category_name','product_id', 'product_name', 'product_price'])
retail.registerTempTable("exam3")
print parsedData.take(3)
回答by Nathaniel Ford
You don't have pyspark
installed in a place available to the python installation you're using. To confirm this, on your command line terminal, with your virtualenv
activated, enter your REPL (python
) and type import pyspark
:
您没有pyspark
安装在您正在使用的 python 安装可用的位置。要确认这一点,在您的命令行终端上,在您virtualenv
激活的情况下,输入您的 REPL ( python
) 并键入import pyspark
:
$ python
Python 3.5.0 (default, Dec 3 2015, 09:58:14)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named 'pyspark'
If you see the No module name 'pyspark'
ImportError you need to install that library. Quit the REPL and type:
如果您看到No module name 'pyspark'
ImportError,则需要安装该库。退出 REPL 并输入:
pip install pyspark
Then re-enter the repl to confirm it works:
然后重新输入repl以确认它有效:
$ python
Python 3.5.0 (default, Dec 3 2015, 09:58:14)
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
>>>
As a note, it is critical your virtual environment is activated. When in the directory of your virtual environment:
请注意,激活虚拟环境至关重要。在虚拟环境的目录中时:
$ source bin/activate
These instructions are for a unix-based machine, and will vary for Windows.
这些说明适用于基于 Unix 的机器,对于 Windows 会有所不同。
回答by DavidWayne
You can use findspark
to make spark accessible at run time. Typically findspark
will find the directory where you have installed spark, but if it is installed in a non-standard location, you can point it to the correct directory. Once you have installed findspark
, if spark is installed at /path/to/spark_home
just put
您可以使用findspark
使 spark 在运行时可访问。通常findspark
会找到你安装spark的目录,但如果安装在非标准位置,你可以将其指向正确的目录。安装后findspark
,如果/path/to/spark_home
刚刚安装了 spark
import findspark
findspark.init('/path/to/spark_home')
at the very top of your script/notebook and you should now be able to access the pyspark module.
在脚本/笔记本的最顶部,您现在应该能够访问 pyspark 模块。
回答by Harvey
Just use:
只需使用:
import findspark
findspark.init()
import pyspark # only run after findspark.init()
If you don't have findspark module install it with:
如果您没有 findspark 模块,请使用以下命令安装它:
python -m pip install findspark
回答by kepy97
Here is the latest solution that is worked for me FOR MACusers only. I've installed pyspark through pip install pyspark
. But, it didn't work when I execute pyspark
in terminal or even in python import pyspark. I checked that pyspark already installed in my laptop.
这是仅适用于MAC用户的最新解决方案。我已经通过pip install pyspark
. 但是,当我pyspark
在终端或什至在 python import pyspark 中执行时它不起作用。我检查了 pyspark 已经安装在我的笔记本电脑中。
At the end, I found the solution. You just need to add into the bash profile file.
最后,我找到了解决方案。您只需要添加到 bash 配置文件中。
Follow steps:
请按照以下步骤操作:
1) Type the following in a terminal window to go to your home folder.
1) 在终端窗口中键入以下内容以转到您的主文件夹。
cd ~
cd ~
2) Then the following to create a .bash_profile. (You may skip if it already exists.)
2)然后下面创建一个.bash_profile。(如果它已经存在,您可以跳过。)
touch .bash_profile
touch .bash_profile
3) open -e .bash_profile
3) open -e .bash_profile
Then addthe following variables.
然后添加以下变量。
export SPARK_VERSION=`ls /usr/local/Cellar/apache-spark/ | sort | tail -1`
export SPARK_HOME="/usr/local/Cellar/apache-spark/$SPARK_VERSION/libexec"
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH
YOU NEED TO CHANGE py4j-x.x.x-src.zip version number in LAST LINE
您需要在最后一行更改 py4j-xxx-src.zip 版本号
4) Once all these variables are assigned, save and close .bash_profile. Then the type following command to reload the file.
4) 分配所有这些变量后,保存并关闭 .bash_profile。然后键入以下命令以重新加载文件。
. .bash_profile
. .bash_profile