Java Pyspark 错误 - 不支持的类文件主要版本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/53583199/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 00:50:39  来源:igfitidea点击:

Pyspark error - Unsupported class file major version

javapythonmacosapache-sparkpyspark

提问by James

I'm trying to install Spark on my Mac. I've used home-brew to install spark 2.4.0 and Scala. I've installed PySpark in my anaconda environment and am using PyCharm for development. I've exported to my bash profile:

我正在尝试在我的 Mac 上安装 Spark。我使用 home-brew 安装了 spark 2.4.0 和 Scala。我已经在我的 anaconda 环境中安装了 PySpark,并且正在使用 PyCharm 进行开发。我已导出到我的 bash 配置文件:

export SPARK_VERSION=`ls /usr/local/Cellar/apache-spark/ | sort | tail -1`
export SPARK_HOME="/usr/local/Cellar/apache-spark/$SPARK_VERSION/libexec"
export PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH
export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.9-src.zip:$PYTHONPATH

However I'm unable to get it to work.

但是我无法让它工作。

I suspect this is due to java version from reading the traceback. I would really appreciate some help fixed the issue. Please comment if there is any information I could provide that is helpful beyond the traceback.

我怀疑这是由于读取回溯的 java 版本。我真的很感激一些帮助解决这个问题。如果我可以提供任何有助于追溯的信息,请发表评论。

I am getting the following error:

我收到以下错误:

Traceback (most recent call last):
  File "<input>", line 4, in <module>
  File "/anaconda3/envs/coda/lib/python3.6/site-packages/pyspark/rdd.py", line 816, in collect
    sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
  File "/anaconda3/envs/coda/lib/python3.6/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/anaconda3/envs/coda/lib/python3.6/site-packages/py4j/protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: java.lang.IllegalArgumentException: Unsupported class file major version 55

采纳答案by OneCricketeer

Until Spark supports Java 11, or higher (which would be hopefully be mentioned at the latest documentationwhen it is), you have to add in a flag to set your Java version to Java 8.

在 Spark 支持 Java 11 或更高版本(希望在最新的文档中提到)之前,您必须添加一个标志以将您的 Java 版本设置为 Java 8。

As of Spark 2.4.x

从 Spark 2.4.x 开始

Spark runs on Java 8, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.4.4 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x)

Spark 在Java 8、Python 2.7+/3.4+ 和 R 3.1+ 上运行。对于 Scala API,Spark 2.4.4 使用 Scala 2.12。您将需要使用兼容的 Scala 版本 (2.12.x)

On Mac/Unix, see asdf-javafor installing different Javas

在 Mac/Unix 上,请参阅asdf-java以安装不同的 Java

On a Mac, I am able to do this in my .bashrc,

在 Mac 上,我可以在我的.bashrc,

export JAVA_HOME=$(/usr/libexec/java_home -v 1.8)

On Windows, checkout Chocolately, but seriously just use WSL2 or Docker to run Spark.

在 Windows 上,结帐 Chocolately,但认真地只使用 WSL2 或 Docker 来运行 Spark。



You can also set this in spark-env.shrather than set the variable for your whole profile.

您也可以设置它spark-env.sh而不是为整个配置文件设置变量。

And, of course, this all means you'll need to install Java 8 in addition toyour existing Java 11

而且,当然,这一切都意味着除了现有的 Java 11之外,需要安装 Java 8

回答by Chaymae Ahmed

I have the same issue in windows, and I have added JAVA_HOME to the environmental variable path:

我在windows中也有同样的问题,我已经将JAVA_HOME添加到环境变量路径中:

JAVA_HOME: C:\Program Files\Java\jdk-11.0.1

JAVA_HOME: C:\Program Files\Java\jdk-11.0.1

回答by tomasvanoyen

On windows (Windows 10) you can solve the issue by installing jdk-8u201-windows-x64.exe and resetting the system environment variable to the correct version of the JAVA JDK:

在 Windows (Windows 10) 上,您可以通过安装 jdk-8u201-windows-x64.exe 并将系统环境变量重置为 JAVA JDK 的正确版本来解决此问题:

JAVA_HOME -> C:\Program Files\Java\jdk1.8.0_201.

JAVA_HOME -> C:\Program Files\Java\jdk1.8.0_201。

Don't forget to restart the terminal otherwise the resetting of the environment variable does not kick in.

不要忘记重新启动终端,否则环境变量的重置不会启动。

回答by Andre Oporto

I ran into this issue when running Jupyter Notebook and Spark using Java 11. I installed and configured for Java 8 using the following steps.

我在使用 Java 11 运行 Jupyter Notebook 和 Spark 时遇到了这个问题。我使用以下步骤安装和配置了 Java 8。

Install Java 8:

安装 Java 8:

$ sudo apt install openjdk-8-jdk

Since I had already installed Java 11, I then set my default Java to version 8 using:

由于我已经安装了 Java 11,然后我使用以下命令将默认 Java 设置为版本 8:

$ sudo update-alternatives --config java

Select Java 8 and then confirm your changes:

选择 Java 8,然后确认您的更改:

$ java -version

Output should be similar to:

输出应类似于:

openjdk version "1.8.0_191"
OpenJDK Runtime Environment (build 1.8.0_191-8u191-b12-2ubuntu0.18.04.1-b12)
OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)

I'm now able to run Spark successfully in Jupyter Notebook. The steps above were based on the following guide: https://www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-on-ubuntu-18-04

我现在可以在 Jupyter Notebook 中成功运行 Spark。上述步骤基于以下指南:https: //www.digitalocean.com/community/tutorials/how-to-install-java-with-apt-on-ubuntu-18-04

回答by Ferran

With pycharm I found that adding the spark locationthrough findsparkand java8with osat the beginning of the script the easiest solution:

使用 pycharm,我发现在脚本开头使用os通过findsparkjava8添加火花位置是最简单的解决方案:

import findspark
import os
spark_location='/opt/spark-2.4.3/' # Set your own
java8_location= '/usr/lib/jvm/java-8-openjdk-amd64' # Set your own
os.environ['JAVA_HOME'] = java8_location
findspark.init(spark_home=spark_location) 

回答by SergiyKolesnikov

For Debian 10 'buster' users, Java 8 JRE is available in the nvidia-openjdk-8-jrepackage.

对于 Debian 10 'buster' 用户,nvidia-openjdk-8-jre包中提供了 Java 8 JRE 。

Install it with

安装它

sudo apt install nvidia-openjdk-8-jre

Then set JAVA_HOMEwhen running pyspark, e.g.:

然后JAVA_HOME在运行时设置pyspark,例如:

JAVA_HOME=/usr/lib/jvm/nvidia-java-8-openjdk-amd64/ pyspark

回答by Rajitha Fernando

The problem hear is that PySpark requirs Java 8 for some functions. Spark 2.2.1 was having problems with Java 9 and beyond. The recommended solution was to install Java 8.

问题是 PySpark 需要 Java 8 才能实现某些功能。Spark 2.2.1 在 Java 9 及更高版本中存在问题。推荐的解决方案是安装 Java 8。

you can install java-8 specifically, and set it as your default java and try again.

您可以专门安装java-8,并将其设置为您的默认java,然后再试一次。

to install java 8,

安装Java 8,

sudo apt install openjdk-8-jdk

to change the default java version, follow this. you can use command

要更改默认的 Java 版本,请按照此操作。你可以使用命令

 update-java-alternatives --list

for listing all java versions available.

用于列出所有可用的 Java 版本。

set a default one by running the command:

通过运行以下命令设置默认值:

sudo update-alternatives --config java

to select java version you want. provide the accurate number in the provided list. then cheak your java version java -versionand it should be updated. Set the JAVA_HOME variable also.

选择你想要的java版本。在提供的列表中提供准确的数字。然后检查你的java版本java -version,它应该被更新。还要设置 JAVA_HOME 变量。

to set JAVA_HOME, You must find the specific Java version and folder. Fallow thisSO discussion for get a full idea of setting the java home variable. since we are going to use java 8, our folder path is /usr/lib/jvm/java-8-openjdk-amd64/. just go to /usr/lib/jvmfolder and creak what are the avilable folders. use ls -lto see folders and their softlinks, since these folders can be a shortcut for some java versions. then go to your home directory cd ~and edit the bashrc file

要设置 JAVA_HOME,您必须找到特定的 Java 版本和文件夹。放松这个SO 讨论以获得设置 java home 变量的完整想法。因为我们要使用 java 8,所以我们的文件夹路径是/usr/lib/jvm/java-8-openjdk-amd64/. 只需转到/usr/lib/jvm文件夹并吱吱作响什么是可用文件夹。用于ls -l查看文件夹及其软链接,因为这些文件夹可以作为某些 Java 版本的快捷方式。然后转到您的主目录cd ~并编辑 bashrc 文件

cd ~
gedit .bashrc

then Add bellow lines to the file, save and exit.

然后将波纹管添加到文件中,保存并退出。

## SETTING JAVA HOME
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export PATH=$PATH:$JAVA_HOME/bin

after that, to make effect of what you did, type source ~/.bashrcand run in terminal

之后,要使您所做的生效,请source ~/.bashrc在终端中键入并运行

回答by ak6o

Hi actually to be sure that you are putting the right SPARK_HOME PATH you can use this python script to locate it : https://github.com/apache/spark/blob/master/python/pyspark/find_spark_home.py

嗨,实际上要确保您放置了正确的 SPARK_HOME PATH,您可以使用此 python 脚本来定位它:https: //github.com/apache/spark/blob/master/python/pyspark/find_spark_home.py

python3 find_spark_home.py 

/usr/local/lib/python3.7/site-packages/pyspark

On my Mac, on the terminal :

在我的 Mac 上,在终端上:

vim ~/.bashrc

and add the path :

并添加路径:

export JAVA_HOME=/Library/java/JavaVirtualMachines/adoptopenjdk-8.jdk/contents/Home/

export SPARK_HOME=/usr/local/lib/python3.7/site-packages/pyspark

export PYSPARK_PYTHON=/usr/local/bin/python3

and then finally to apply the change

然后最后应用更改

source ~/.bashrc

回答by Tanaji Sutar

This issue occures due to Java version you set on JAVA_HOME environment variable.

出现此问题是由于您在 JAVA_HOME 环境变量上设置的 Java 版本。

OLD JAVA path :/usr/lib/jvm/java-1.11.0-openjdk-amd64

旧 Java 路径:/usr/lib/jvm/java-1.11.0-openjdk-amd64

Solution : Set JAVA_HOME to /usr/lib/jvm/java-8-openjdk-amd64

解决方案:将 JAVA_HOME 设置为 /usr/lib/jvm/java-8-openjdk-amd64

It will work!!!

它会起作用!!!

Note my Error was:

注意我的错误是:

File "/home/tms/myInstallDir/spark-2.4.5-bin-hadoop2.7/python/pyspark/rdd.py", line 816, in collect sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd()) File "/home/tms/myInstallDir/spark-2.4.5-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in callFile "/home/tms/myInstallDir/spark-2.4.5-bin-hadoop2.7/python/pyspark/sql/utils.py", line 79, in deco raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.IllegalArgumentException: u'Unsupported class file major version 55'

文件“/home/tms/myInstallDir/spark-2.4.5-bin-hadoop2.7/python/pyspark/rdd.py”,第816行,在collect sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd .rdd()) 文件“/home/tms/myInstallDir/spark-2.4.5-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”,第 1257 行,在调用文件“/home/tms/myInstallDir/spark-2.4.5-bin-hadoop2.7/python/pyspark/sql/utils.py”,第 79 行,在装饰中引发 IllegalArgumentException(s.split(': ', 1)[1], stackTrace) pyspark.sql.utils.IllegalArgumentException: u'不支持的类文件主要版本 55'