Pyspark:异常:Java 网关进程在向驱动程序发送其端口号之前退出

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31841509/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 11:40:25  来源:igfitidea点击:

Pyspark: Exception: Java gateway process exited before sending the driver its port number

javapythonmacosapache-sparkpyspark

提问by mt88

I'm trying to run pyspark on my macbook air. When i try starting it up I get the error:

我正在尝试在我的 macbook air 上运行 pyspark。当我尝试启动它时,出现错误:

Exception: Java gateway process exited before sending the driver its port number

when sc = SparkContext() is being called upon startup. I have tried running the following commands:

当 sc = SparkContext() 在启动时被调用时。我试过运行以下命令:

./bin/pyspark
./bin/spark-shell
export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

with no avail. I have also looked here:

无济于事。我也看过这里:

Spark + Python - Java gateway process exited before sending the driver its port number?

Spark + Python - Java 网关进程在向驱动程序发送其端口号之前退出了吗?

but the question has never been answered. Please help! Thanks.

但这个问题从未得到回答。请帮忙!谢谢。

回答by quax

Had the same issue with my iphython notebook (IPython 3.2.1) on Linux (ubuntu).

我的 iphython notebook (IPython 3.2.1) 在 Linux (ubuntu) 上有同样的问题。

What was missing in my case was setting the master URL in the $PYSPARK_SUBMIT_ARGS environment like this (assuming you use bash):

在我的案例中缺少的是在 $PYSPARK_SUBMIT_ARGS 环境中设置主 URL 像这样(假设您使用 bash):

export PYSPARK_SUBMIT_ARGS="--master spark://<host>:<port>"

e.g.

例如

export PYSPARK_SUBMIT_ARGS="--master spark://192.168.2.40:7077"

You can put this into your .bashrc file. You get the correct URL in the log for the spark master (the location for this log is reported when you start the master with /sbin/start_master.sh).

你可以把它放到你的 .bashrc 文件中。您在日志中获得了 spark master 的正确 URL(当您使用 /sbin/start_master.sh 启动 master 时会报告此日志的位置)。

回答by Ida

I got the same Java gateway process exited......port numberexception even though I set PYSPARK_SUBMIT_ARGSproperly. I'm running Spark 1.6 and trying to get pyspark to work with IPython4/Jupyter (OS: ubuntu as VM guest).

Java gateway process exited......port number即使我设置PYSPARK_SUBMIT_ARGS正确,我也遇到了同样的异常。我正在运行 Spark 1.6 并试图让 pyspark 与 IPython4/Jupyter(操作系统:ubuntu 作为 VM 来宾)一起使用。

While I got this exception, I noticed an hs_err_*.log was generated and it started with:

当我遇到这个异常时,我注意到生成了一个 hs_err_*.log 并且它开始于:

There is insufficient memory for the Java Runtime Environment to continue. Native memory allocation (malloc) failed to allocate 715849728 bytes for committing reserved memory.

There is insufficient memory for the Java Runtime Environment to continue. Native memory allocation (malloc) failed to allocate 715849728 bytes for committing reserved memory.

So I increased the memory allocated for my ubuntu via VirtualBox Setting and restarted the guest ubuntu. Then this Java gatewayexception goes away and everything worked out fine.

所以我通过 VirtualBox Setting 增加了为我的 ubuntu 分配的内存并重新启动了来宾 ubuntu。然后这个Java gateway异常消失了,一切都很好。

回答by Anup Ash

this should help you

这应该可以帮助你

One solution is adding pyspark-shell to the shell environment variable PYSPARK_SUBMIT_ARGS:

一种解决方案是将 pyspark-shell 添加到 shell 环境变量 PYSPARK_SUBMIT_ARGS:

export PYSPARK_SUBMIT_ARGS="--master local[2] pyspark-shell"

There is a change in python/pyspark/java_gateway.py , which requires PYSPARK_SUBMIT_ARGS includes pyspark-shell if a PYSPARK_SUBMIT_ARGS variable is set by a user.

python/pyspark/java_gateway.py 中有一个变化,如果用户设置了 PYSPARK_SUBMIT_ARGS 变量,则需要 PYSPARK_SUBMIT_ARGS 包含 pyspark-shell。

回答by Pim Schaaf

I got the same Exception: Java gateway process exited before sending the driver its port numberin Cloudera VM when trying to start IPython with CSV support with a syntax error:

Exception: Java gateway process exited before sending the driver its port number在尝试使用 CSV 支持启动 IPython 时,我在 Cloudera VM 中得到了相同的语法错误:

PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10.1.4.0

PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10.1.4.0

will throw the error, while:

会抛出错误,而:

PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10:1.4.0

PYSPARK_DRIVER_PYTHON=ipython pyspark --packages com.databricks:spark-csv_2.10:1.4.0

will not.

将不会。

The difference is in that last colonin the last (working) example, seperating the Scala version numberfrom the package version number.

不同之处在于最后一个(工作)示例中的最后一个冒号,将Scala 版本号包版本号分开。

回答by Josh Terrell

I got this error because I was running low on disk space.

我收到此错误是因为我的磁盘空间不足。

回答by Old Panda

One possible reason is JAVA_HOME is not set because java is not installed.

一种可能的原因是未设置 JAVA_HOME,因为未安装 java。

I encountered the same issue. It says

我遇到了同样的问题。它说

Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/spark/launcher/Main : Unsupported major.minor version 51.0
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(ClassLoader.java:643)
    at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
    at java.net.URLClassLoader.access
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer
0(URLClassLoader.java:73) at java.net.URLClassLoader.run(URLClassLoader.java:212) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:323) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:296) at java.lang.ClassLoader.loadClass(ClassLoader.java:268) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:406) Traceback (most recent call last): File "<string>", line 1, in <module> File "/opt/spark/python/pyspark/conf.py", line 104, in __init__ SparkContext._ensure_initialized() File "/opt/spark/python/pyspark/context.py", line 243, in _ensure_initialized SparkContext._gateway = gateway or launch_gateway() File "/opt/spark/python/pyspark/java_gateway.py", line 94, in launch_gateway raise Exception("Java gateway process exited before sending the driver its port number") Exception: Java gateway process exited before sending the driver its port number

at sc = pyspark.SparkConf(). I solved it by running

sc = pyspark.SparkConf()。我通过运行解决了它

sudo update-alternatives --config java 

which is from https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04

这是来自https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-get-on-ubuntu-16-04

回答by Pankaj Kumar

In my case this error came for the script which was running fine before. So I figured out that this might be due to my JAVA update. Before I was using java 1.8 but I had accidentally updated to java 1.9. When I switched back to java 1.8 the error disappeared and everything is running fine. For those, who get this error for the same reason but do not know how to switch back to older java version on ubuntu: run

在我的情况下,这个错误来自之前运行良好的脚本。所以我发现这可能是由于我的 JAVA 更新。在我使用 java 1.8 之前,我不小心更新到了 java 1.9。当我切换回 java 1.8 时,错误消失了,一切正常。对于那些出于同样原因收到此错误但不知道如何在 ubuntu 上切换回旧 Java 版本的用户:运行

sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

and make the selection for java version

并选择java版本

回答by Coral

Had same issue, after installing java using below lines solved the issue !

有同样的问题,在使用以下几行安装 java 后解决了这个问题!

sudo rm -rf jdk-10.jdk/

回答by aghd

Worked hours on this. My problem was with Java 10 installation. I uninstalled it and installed Java 8, and now Pyspark works.

在这方面工作了几个小时。我的问题是 Java 10 安装。我卸载了它并安装了 Java 8,现在 Pyspark 可以工作了。

回答by Kiem Nguyen

After spending hours and hours trying many different solutions, I can confirm that Java 10 SDK causes this error. On Mac, please navigate to /Library/Java/JavaVirtualMachines then run this command to uninstall Java JDK 10 completely:

在花费数小时尝试许多不同的解决方案后,我可以确认 Java 10 SDK 导致了此错误。在 Mac 上,请导航到 /Library/Java/JavaVirtualMachines 然后运行此命令以完全卸载 Java JDK 10:

##代码##

After that, please download JDK 8 then the problem will be solved.

之后,请下载JDK 8,问题将得到解决。