Python 没有模块名称pyspark错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34302314/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:47:00  来源:igfitidea点击:

No module name pyspark error

pythonpyspark

提问by BetterEveryDay

This is the exact code from a tutorial I'm following. My classmate didn't get this error with the same code:

这是我正在关注的教程中的确切代码。我的同学用同样的代码没有得到这个错误:

ImportError                                Traceback (most recent call last)

<ipython-input-1-c6e1bed850ab> in <module>()
----> 1 from pyspark import SparkContext
      2 sc = SparkContext('local', 'Exam_3')
      3 
      4 from pyspark.sql import SQLContext
      5 sqlContext = SQLContext(sc)

ImportError: No module named pyspark

This is the code:

这是代码:

from pyspark import SparkContext
sc = SparkContext('local', 'Exam_3')
from pyspark.sql import SQLContext    
sqlContext = SQLContext(sc)
data = sc.textFile("exam3")
parsedData = data.map(lambda line: [float(x) for x in line.split(',')])
retail = sqlContext.createDataFrame(parsedData, 
     ['category_name','product_id', 'product_name', 'product_price'])
retail.registerTempTable("exam3")
print parsedData.take(3)

回答by Nathaniel Ford

You don't have pysparkinstalled in a place available to the python installation you're using. To confirm this, on your command line terminal, with your virtualenvactivated, enter your REPL (python) and type import pyspark:

您没有pyspark安装在您正在使用的 python 安装可用的位置。要确认这一点,在您的命令行终端上,在您virtualenv激活的情况下,输入您的 REPL ( python) 并键入import pyspark

$ python
Python 3.5.0 (default, Dec  3 2015, 09:58:14) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named 'pyspark'

If you see the No module name 'pyspark'ImportError you need to install that library. Quit the REPL and type:

如果您看到No module name 'pyspark'ImportError,则需要安装该库。退出 REPL 并输入:

pip install pyspark

Then re-enter the repl to confirm it works:

然后重新输入repl以确认它有效:

$ python
Python 3.5.0 (default, Dec  3 2015, 09:58:14) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.1.76)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
>>>

As a note, it is critical your virtual environment is activated. When in the directory of your virtual environment:

请注意,激活虚拟环境至关重要。在虚拟环境的目录中时:

$ source bin/activate

These instructions are for a unix-based machine, and will vary for Windows.

这些说明适用于基于 Unix 的机器,对于 Windows 会有所不同。

回答by DavidWayne

You can use findsparkto make spark accessible at run time. Typically findsparkwill find the directory where you have installed spark, but if it is installed in a non-standard location, you can point it to the correct directory. Once you have installed findspark, if spark is installed at /path/to/spark_homejust put

您可以使用findspark使 spark 在运行时可访问。通常findspark会找到你安装spark的目录,但如果安装在非标准位置,你可以将其指向正确的目录。安装后findspark,如果/path/to/spark_home刚刚安装了 spark

import findspark
findspark.init('/path/to/spark_home')

at the very top of your script/notebook and you should now be able to access the pyspark module.

在脚本/笔记本的最顶部,您现在应该能够访问 pyspark 模块。

回答by Harvey

Just use:

只需使用:

import findspark
findspark.init()

import pyspark # only run after findspark.init()

If you don't have findspark module install it with:

如果您没有 findspark 模块,请使用以下命令安装它:

python -m pip install findspark

回答by kepy97

Here is the latest solution that is worked for me FOR MACusers only. I've installed pyspark through pip install pyspark. But, it didn't work when I execute pysparkin terminal or even in python import pyspark. I checked that pyspark already installed in my laptop.

这是仅适用于MAC用户的最新解决方案。我已经通过pip install pyspark. 但是,当我pyspark在终端或什至在 python import pyspark 中执行时它不起作用。我检查了 pyspark 已经安装在我的笔记本电脑中。

At the end, I found the solution. You just need to add into the bash profile file.

最后,我找到了解决方案。您只需要添加到 bash 配置文件中。

Follow steps:

请按照以下步骤操作:

1) Type the following in a terminal window to go to your home folder.

1) 在终端窗口中键入以下内容以转到您的主文件夹。

cd ~

cd ~

2) Then the following to create a .bash_profile. (You may skip if it already exists.)

2)然后下面创建一个.bash_profile。(如果它已经存在,您可以跳过。)

touch .bash_profile

touch .bash_profile

3) open -e .bash_profile

3) open -e .bash_profile

Then addthe following variables.

然后添加以下变量。

export SPARK_VERSION=`ls /usr/local/Cellar/apache-spark/ | sort | tail -1`
export SPARK_HOME="/usr/local/Cellar/apache-spark/$SPARK_VERSION/libexec"
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH

YOU NEED TO CHANGE py4j-x.x.x-src.zip version number in LAST LINE

您需要在最后一行更改 py4j-xxx-src.zip 版本号

4) Once all these variables are assigned, save and close .bash_profile. Then the type following command to reload the file.

4) 分配所有这些变量后,保存并关闭 .bash_profile。然后键入以下命令以重新加载文件。

. .bash_profile

. .bash_profile