Apache Spark:如何在 Python 3 中使用 pyspark
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30279783/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Apache Spark: How to use pyspark with Python 3
提问by tchakravarty
I built Spark 1.4 from the GH development master, and the build went through fine. But when I do a bin/pyspark
I get the Python 2.7.9 version. How can I change this?
我从 GH 开发大师构建了 Spark 1.4,构建过程顺利。但是当我执行 a 时,bin/pyspark
我得到了 Python 2.7.9 版本。我怎样才能改变这个?
回答by rfkortekaas
Have a look into the file. The shebang line is probably pointed to the 'env' binary which searches the path for the first compatible executable.
看看文件。shebang 行可能指向 'env' 二进制文件,它搜索第一个兼容可执行文件的路径。
You can change python to python3. Change the env to directly use hardcoded the python3 binary. Or execute the binary directly with python3 and omit the shebang line.
您可以将python更改为python3。更改 env 以直接使用硬编码的 python3 二进制文件。或者直接用python3执行二进制文件并省略shebang行。
回答by Piotr Migdal
PYSPARK_PYTHON=python3
./bin/pyspark
If you want to run in in IPython Notebook, write:
如果要在 IPython Notebook 中运行,请编写:
PYSPARK_PYTHON=python3
PYSPARK_DRIVER_PYTHON=ipython
PYSPARK_DRIVER_PYTHON_OPTS="notebook"
./bin/pyspark
If python3
is not accessible, you need to pass path to it instead.
如果python3
不可访问,则需要将路径传递给它。
Bear in mind that the current documentation (as of 1.4.1)has outdate instructions. Fortunately, it has been patched.
请记住,当前文档(从 1.4.1 开始)有过时的说明。幸运的是,它已被修补。
回答by Rtik88
Just set the environment variable:
只需设置环境变量:
export PYSPARK_PYTHON=python3
export PYSPARK_PYTHON=python3
in case you want this to be a permanent change add this line to pyspark script.
如果您希望这是永久更改,请将此行添加到 pyspark 脚本。
回答by yangh
1,edit profile :vim ~/.profile
1,编辑个人资料:vim ~/.profile
2,add the code into the file: export PYSPARK_PYTHON=python3
2、在文件中添加代码: export PYSPARK_PYTHON=python3
3, execute command : source ~/.profile
3、执行命令: source ~/.profile
4, ./bin/pyspark
4、 ./bin/pyspark
回答by oya163
For Jupyter Notebook, edit spark-env.sh
file as shown below from command line
对于 Jupyter Notebook,从命令行编辑spark-env.sh
文件,如下所示
$ vi $SPARK_HOME/conf/spark-env.sh
Goto the bottom of the file and copy paste these lines
转到文件底部并复制粘贴这些行
export PYSPARK_PYTHON=python3
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
Then, simply run following command to start pyspark in notebook
然后,只需运行以下命令即可在笔记本中启动 pyspark
$ pyspark