Python 如何在anaconda中导入pyspark

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33814005/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:02:16  来源:igfitidea点击:

How to import pyspark in anaconda

pythonapache-sparkanacondapyspark

提问by farhawa

I am trying to import and use pysparkwith anaconda.

我正在尝试导入并pyspark与 anaconda 一起使用。

After installing spark, and setting the $SPARK_HOMEvariable I tried:

安装 spark 并设置$SPARK_HOME变量后,我尝试了:

$ pip install pyspark

This won't work (of course) because I discovered that I need to tel python to look for pysparkunder $SPARK_HOME/python/. The problem is that to do that, I need to set the $PYTHONPATHwhile anaconda don't use that environment variable.

这(当然)是行不通的,因为我发现我需要电话蟒蛇寻找pyspark$SPARK_HOME/python/。问题是要做到这一点,我需要设置$PYTHONPATHwhile anaconda 不使用该环境变量。

I tried to copy the content of $SPARK_HOME/python/to ANACONDA_HOME/lib/python2.7/site-packages/but it won't work.

我试图复制内容$SPARK_HOME/python/ANACONDA_HOME/lib/python2.7/site-packages/,但它不会工作。

Is there any solution to use pyspark in anaconda?

有没有在anaconda中使用pyspark的解决方案?

回答by zero323

You can simply set PYSPARK_DRIVER_PYTHONand PYSPARK_PYTHONenvironmental variables to use either root Anaconda Python or a specific Anaconda environment. For example:

你可以简单地设置PYSPARK_DRIVER_PYTHONPYSPARK_PYTHON环境变量为使用根蟒蛇Python或特定蟒蛇环境。例如:

export ANACONDA_ROOT=~/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python

or

或者

export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/envs/foo/bin/ipython 
export PYSPARK_PYTHON=$ANACONDA_ROOT/envs/foo/bin/python 

When you use $SPARK_HOME/bin/pyspark/ $SPARK_HOME/bin/spark-submitit will choose a correct environment. Just remember that PySpark has to the same Python version on all machines.

当您使用$SPARK_HOME/bin/pyspark/ 时$SPARK_HOME/bin/spark-submit,它将选择一个正确的环境。请记住,PySpark 必须在所有机器上使用相同的 Python 版本。

On a side note using PYTHONPATHshould work just fine, even if it is not recommended.

在旁注中使用PYTHONPATH应该可以正常工作,即使不推荐使用。

回答by PC3SQ

I don't believe that you need nor can install pyspark as a module. Instead, I extended my $PYTHONPATHin my ~/.bash_profile as follows:

我认为您不需要也不能将 pyspark 作为模块安装。相反,我$PYTHONPATH在 ~/.bash_profile 中扩展了我的内容,如下所示:

export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH

export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/build:$PYTHONPATH

After that, I was able to import pyspark as ps. Hope that works for you too.

在那之后,我能够import pyspark as ps。希望这对你也有用。

回答by Tom Whittaker

Here are the complete set of environment variables I had to put in my .bashrc to get this to work in both scripts and notebook

这是我必须放入 .bashrc 中的完整环境变量集才能使其在脚本和笔记本中都能正常工作

export ANACONDA_ROOT=~/anaconda2
export PYSPARK_DRIVER_PYTHON=$ANACONDA_ROOT/bin/ipython
export PYSPARK_PYTHON=$ANACONDA_ROOT/bin/python

export SPARK_HOME=/opt/spark-2.1.0-bin-hadoop2.7
export PYLIB=/opt/spark-2.1.0-bin-hadoop2.7/python/lib

export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.4-src.zip:$PYTHONPATH

回答by Tshilidzi Mudau

Perhaps this can help someone, According to the Anaconda documentation, you install FindSpark as follows:

也许这可以帮助某人,根据Anaconda 文档,您按如下方式安装 FindSpark:

conda install -c conda-forge findspark 

It was only after installing it as showned about that I was able to import FindSpark. No export statements required.

只有在如图所示安装它之后,我才能导入 FindSpark。无需导出报表。

回答by mewa6

This may have only become possible recently, but I used the following and it worked perfectly. After this, I am able to 'import pyspark as ps' and use it with no problems.

这可能最近才成为可能,但我使用了以下内容并且效果很好。在此之后,我可以“将 pyspark 作为 ps 导入”并毫无问题地使用它。

conda install -c conda-forge pyspark

conda install -c conda-forge pyspark

回答by Jerrold110

Hey there you could try these running these lines in the Anaconda powershell instead. Straight from https://anaconda.org/conda-forge/findspark

嘿,您可以尝试在 Anaconda powershell 中运行这些行。直接来自https://anaconda.org/conda-forge/findspark

To install this package with conda run one of the following:
conda install -c conda-forge findspark
conda install -c conda-forge/label/gcc7 findspark
conda install -c conda-forge/label/cf201901 findspark