环境变量 PYSPARK_PYTHON 和 PYSPARK_DRIVER_PYTHON

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/48260412/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:37:27  来源:igfitidea点击:

environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON

pythonpython-3.xapache-sparkpyspark

提问by Akash Kumar

I have installed pyspark recently. It was installed correctly. When I am using following simple program in python, I am getting an error.

我最近安装了pyspark。它已正确安装。当我在 python 中使用以下简单程序时,出现错误。

>>from pyspark import SparkContext
>>sc = SparkContext()
>>data = range(1,1000)
>>rdd = sc.parallelize(data)
>>rdd.collect()

while running the last line I am getting error whose key line seems to be

在运行最后一行时,我收到错误,其关键行似乎是

[Stage 0:>                                                          (0 + 0) / 4]18/01/15 14:36:32 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/pyspark/python/lib/pyspark.zip/pyspark/worker.py", line 123, in main
    ("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

I have the following variables in .bashrc

我在 .bashrc 中有以下变量

export SPARK_HOME=/opt/spark
export PYTHONPATH=$SPARK_HOME/python3

I am using Python 3.

我正在使用 Python 3。

回答by Alex

You should set the following environment variables in $SPARK_HOME/conf/spark-env.sh:

您应该在 中设置以下环境变量$SPARK_HOME/conf/spark-env.sh

export PYSPARK_PYTHON=/usr/bin/python
export PYSPARK_DRIVER_PYTHON=/usr/bin/python

If spark-env.shdoesn't exist, you can rename spark-env.sh.template

如果spark-env.sh不存在,您可以重命名spark-env.sh.template

回答by buxizhizhoum

By the way, if you use PyCharm, you could add PYSPARK_PYTHONand PYSPARK_DRIVER_PYTHONto run/debug configurations per image below enter image description here

顺便说一句,如果您使用 PyCharm,您可以添加PYSPARK_PYTHONPYSPARK_DRIVER_PYTHON运行/调试下面每个图像的配置 在此处输入图片说明

回答by James Chang

Just run the code below in the very beginning of your code. I am using Python3.7. You might need to run locate python3.7to get your Python path.

只需在代码的最开始运行下面的代码。我正在使用 Python3.7。您可能需要运行locate python3.7以获取 Python 路径。

import os
os.environ["PYSPARK_PYTHON"] = "/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7"
os.environ["PYSPARK_DRIVER_PYTHON"] = "/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7"

回答by igorkf

I'm using Jupyter Notebook to study PySpark, and that's what worked for me.
Find where python3is installed doing in a terminal:

我正在使用 Jupyter Notebook 来研究 PySpark,这对我有用。在终端中
查找python3安装位置:

which python3

Here is pointing to /usr/bin/python3.
Now in the the beginning of the notebook (or .pyscript), do:

这里指的是/usr/bin/python3.
现在在笔记本(或.py脚本)的开头,执行:

import os

# Set spark environments
os.environ['PYSPARK_PYTHON'] = '/usr/bin/python3'
os.environ['PYSPARK_DRIVER_PYTHON'] = '/usr/bin/python3'

Restart your notebook session and it should works!

重新启动您的笔记本会话,它应该可以工作了!

回答by Emanuel Fontelles

Apache-Spark 2.4.3 on Archlinux

Archlinux 上的 Apache-Spark 2.4.3

I've just installed Apache-Spark-2.3.4from Apache-Spark website, I'm using Archlinux distribution, it's simple and lightweight distribution. So, I've installed and put the apache-sparkdirectory on /opt/apache-spark/, now it's time to export our environment variables, remember, I'm using Archlinux, so take in mind to using your $JAVA_HOMEfor example.

我刚刚Apache-Spark-2.3.4Apache-Spark 网站安装,我使用的是Archlinux 发行版,它是简单而轻量级的发行版。所以,我已经安装并放置了apache-spark目录/opt/apache-spark/,现在是导出我们的环境变量的时候了,记住,我使用的是 Archlinux,所以请记住使用你$JAVA_HOME的例子。

Importing environment variables

导入环境变量

echo 'export JAVA_HOME=/usr/lib/jvm/java-7-openjdk/jre' >> /home/user/.bashrc
echo 'export SPARK_HOME=/opt/apache-spark'  >> /home/user/.bashrc
echo 'export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH'  >> /home/user/.bashrc
echo 'export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH'  >> /home/user/.bashrc
source ../.bashrc 

Testing

测试

emanuel@hinton ~ $ echo 'export JAVA_HOME=/usr/lib/jvm/java-7-openjdk/jre' >> /home/emanuel/.bashrc
emanuel@hinton ~ $ echo 'export SPARK_HOME=/opt/apache-spark'  >> /home/emanuel/.bashrc
emanuel@hinton ~ $ echo 'export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH'  >> /home/emanuel/.bashrc
emanuel@hinton ~ $ echo 'export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.7-src.zip:$PYTHONPATH'  >> /home/emanuel/.bashrc
emanuel@hinton ~ $ source .bashrc 
emanuel@hinton ~ $ python
Python 3.7.3 (default, Jun 24 2019, 04:54:02) 
[GCC 9.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyspark
>>> 

Everything it's working fine since you correctly imported the environment variables for SparkContext.

一切正常,因为您正确导入了SparkContext.

Using Apache-Spark on Archlinux via DockerImage

通过 DockerImage 在 Archlinux 上使用 Apache-Spark

For my use purposes I've created a Docker image with python, jupyter-notebookand apache-spark-2.3.4

为了我的使用目的,我创建了一个 Docker 镜像pythonjupyter-notebook并且apache-spark-2.3.4

running the image

运行图像

docker run -ti -p 8888:8888 emanuelfontelles/spark-jupyter

just go to your browser and type

只需转到您的浏览器并输入

http://localhost:8888/tree

and will prompted a authentication page, come back to terminal and copy the token number and voila, will have Archlinux container running a Apache-Spark distribution.

并会提示一个身份验证页面,返回终端并复制令牌号,瞧,Archlinux 容器将运行 Apache-Spark 发行版。

回答by Ruxi Zhang

I got the same issue, and I set both variable in .bash_profile

我遇到了同样的问题,我将两个变量都设置在 .bash_profile

export PYSPARK_PYTHON=/usr/local/bin/python3
export PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3

But My problem is still there.

但我的问题仍然存在。

Then I found out the problem is that my default python version is python 2.7 by typing python --version

然后我发现问题是我的默认 python 版本是 python 2.7 通过键入 python --version

So I solved the problem by following below page: How to set Python's default version to 3.x on OS X?

所以我通过以下页面解决了这个问题: How to set Python's default version to 3.x on OS X?

回答by Eric Cheng

I tried two methods for the question. the method in the picture can works.

对于这个问题,我尝试了两种方法。图中的方法是可以的。

add environment variables

添加环境变量

PYSPARK_PYTHON=/usr/local/bin/python3.7;PYSPARK_DRIVER_PYTHON=/usr/local/bin/python3.7;PYTHONUNBUFFERED=1