如何指定 spark-submit 使用的 Python 版本?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29972565/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to specify the version of Python for spark-submit to use?
提问by A7med
I have two versions of Python. When I launch a spark application using spark-submit, the application uses the default version of Python. But, I want to use the other one. How to specify the version of Python for spark-submit to use?
我有两个版本的 Python。当我使用 spark-submit 启动 spark 应用程序时,该应用程序使用 Python 的默认版本。但是,我想使用另一个。如何指定 spark-submit 使用的 Python 版本?
回答by A7med
You can either specify the version of Python by listing the path to your install in a shebang line in your script:
您可以通过在脚本的 shebang 行中列出安装路径来指定 Python 的版本:
myfile.py:
我的文件.py:
#!/full/path/to/specific/python2.7
or by calling it on the command line without a shebang line in your script:
或者通过在脚本中没有shebang行的命令行上调用它:
/full/path/to/specific/python2.7 myfile.py
However, I'd recommend looking into Python's excellent virtual environments that will allow you to create separate "environments" for each version of Python. Virtual environments more or less work by handling all the path specification after you activate them, alllowing you to just type python myfile.py
without worrying about conflicting dependencies or knowing the full path to a specific version of python.
但是,我建议您查看 Python 出色的虚拟环境,它允许您为每个版本的 Python 创建单独的“环境”。虚拟环境或多或少通过在激活它们后处理所有路径规范来工作,允许您只需键入python myfile.py
而无需担心冲突的依赖项或知道特定版本 python 的完整路径。
Click here for an excellent guide to getting started with Virtual Environmentsor [here]for the Python3 official documentation.
单击此处获取有关虚拟环境入门的优秀指南或[此处]获取 Python3 官方文档。
If you do not have access to the nodes and you're running this using PySpark, you can specify the Python version in your spark-env.sh
:
如果您无权访问节点并且您正在使用 PySpark 运行它,您可以在您的spark-env.sh
:
Spark_Install_Dir/conf/spark-env.sh:
Spark_Install_Dir/conf/spark-env.sh:
PYSPARK_PYTHON = /full/path/to/python_executable/eg/python2.7
回答by Benjamin Rowell
You can set the PYSPARK_PYTHON
variable in conf/spark-env.sh
(in Spark's installation directory) to the absolute path of the desired Python executable.
您可以将(在 Spark 的安装目录中)中的PYSPARK_PYTHON
变量设置为conf/spark-env.sh
所需 Python 可执行文件的绝对路径。
Spark distribution contains spark-env.sh.template
(spark-env.cmd.template
on Windows) by default. It must be renamed to spark-env.sh
(spark-env.cmd
) first.
默认情况下,Spark 发行版包含spark-env.sh.template
(spark-env.cmd.template
在 Windows 上)。必须先将其重命名为spark-env.sh
( spark-env.cmd
)。
For example, if Python executable is installed under /opt/anaconda3/bin/python3
:
例如,如果 Python 可执行文件安装在/opt/anaconda3/bin/python3
:
PYSPARK_PYTHON='/opt/anaconda3/bin/python3'
Check out the configuration documentationfor more information.
查看配置文档以获取更多信息。
回答by Bruno Faria
In my environment I simply used
在我的环境中,我只是使用
export PYSPARK_PYTHON=python2.7
It worked for me
它对我有用
回答by Gemini Keith
If you want to specify the option PYSPARK_MAJOR_PYTHON_VERSION
in spark-submit
command line, you should check this:
如果你想PYSPARK_MAJOR_PYTHON_VERSION
在spark-submit
命令行中指定选项,你应该检查这个:
http://spark.apache.org/docs/latest/running-on-kubernetes.html
http://spark.apache.org/docs/latest/running-on-kubernetes.html
You can search spark.kubernetes.pyspark.pythonVersion
in this page and you'll find following content:
您可以spark.kubernetes.pyspark.pythonVersion
在此页面中搜索,您会发现以下内容:
spark.kubernetes.pyspark.pythonVersion "2" This sets the major Python version of the docker image used to run the driver and executor containers. Can either be 2 or 3.
Now, your command should looks like :
现在,您的命令应该如下所示:
spark-submit --conf spark.kubernetes.pyspark.pythonVersion=3 ...
It should work.
它应该工作。