Python 如何在anaconda中导入pyspark
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33814005/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to import pyspark in anaconda
提问by farhawa
I am trying to import and use pyspark
with anaconda.
我正在尝试导入并pyspark
与 anaconda 一起使用。
After installing spark, and setting the $SPARK_HOME
variable I tried:
安装 spark 并设置$SPARK_HOME
变量后,我尝试了:
$ pip install pyspark
This won't work (of course) because I discovered that I need to tel python to look for pyspark
under $SPARK_HOME/python/
. The problem is that to do that, I need to set the $PYTHONPATH
while anaconda don't use that environment variable.
这(当然)是行不通的,因为我发现我需要电话蟒蛇寻找pyspark
下$SPARK_HOME/python/
。问题是要做到这一点,我需要设置$PYTHONPATH
while anaconda 不使用该环境变量。
I tried to copy the content of $SPARK_HOME/python/
to ANACONDA_HOME/lib/python2.7/site-packages/
but it won't work.
我试图复制内容$SPARK_HOME/python/
到ANACONDA_HOME/lib/python2.7/site-packages/
,但它不会工作。
Is there any solution to use pyspark in anaconda?
有没有在anaconda中使用pyspark的解决方案?
回答by zero323
You can simply set PYSPARK_DRIVER_PYTHON
and PYSPARK_PYTHON
environmental variables to use either root Anaconda Python or a specific Anaconda environment. For example:
你可以简单地设置PYSPARK_DRIVER_PYTHON
和PYSPARK_PYTHON
环境变量为使用根蟒蛇Python或特定蟒蛇环境。例如:
export ANACONDA_ROOT=~/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python
or
或者
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/envs/foo/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/envs/foo/bin/python
When you use $SPARK_HOME/bin/pyspark
/ $SPARK_HOME/bin/spark-submit
it will choose a correct environment. Just remember that PySpark has to the same Python version on all machines.
当您使用$SPARK_HOME/bin/pyspark
/ 时$SPARK_HOME/bin/spark-submit
,它将选择一个正确的环境。请记住,PySpark 必须在所有机器上使用相同的 Python 版本。
On a side note using PYTHONPATH
should work just fine, even if it is not recommended.
在旁注中使用PYTHONPATH
应该可以正常工作,即使不推荐使用。
回答by PC3SQ
I don't believe that you need nor can install pyspark as a module. Instead, I extended my $PYTHONPATH
in my ~/.bash_profile as follows:
我认为您不需要也不能将 pyspark 作为模块安装。相反,我$PYTHONPATH
在 ~/.bash_profile 中扩展了我的内容,如下所示:
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH
After that, I was able to import pyspark as ps
. Hope that works for you too.
在那之后,我能够import pyspark as ps
。希望这对你也有用。
回答by Tom Whittaker
Here are the complete set of environment variables I had to put in my .bashrc to get this to work in both scripts and notebook
这是我必须放入 .bashrc 中的完整环境变量集才能使其在脚本和笔记本中都能正常工作
export ANACONDA_ROOT=~/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python
export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7
export PYLIB=/opt/spark-2.1.0-bin-hadoop2.7/python/lib
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH
回答by Tshilidzi Mudau
Perhaps this can help someone, According to the Anaconda documentation, you install FindSpark as follows:
也许这可以帮助某人,根据Anaconda 文档,您按如下方式安装 FindSpark:
conda install -c conda-forge findspark
It was only after installing it as showned about that I was able to import FindSpark. No export statements required.
只有在如图所示安装它之后,我才能导入 FindSpark。无需导出报表。
回答by mewa6
This may have only become possible recently, but I used the following and it worked perfectly. After this, I am able to 'import pyspark as ps' and use it with no problems.
这可能最近才成为可能,但我使用了以下内容并且效果很好。在此之后,我可以“将 pyspark 作为 ps 导入”并毫无问题地使用它。
conda install -c conda-forge pyspark
conda install -c conda-forge pyspark
回答by Jerrold110
Hey there you could try these running these lines in the Anaconda powershell instead. Straight from https://anaconda.org/conda-forge/findspark
嘿,您可以尝试在 Anaconda powershell 中运行这些行。直接来自https://anaconda.org/conda-forge/findspark
To install this package with conda run one of the following:
conda install -c conda-forge findspark
conda install -c conda-forge/label/gcc7 findspark
conda install -c conda-forge/label/cf201901 findspark