Spark 应用程序 - Java.lang.OutOfMemoryError:Java 堆空间
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33964053/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Spark applicaition - Java.lang.OutOfMemoryError: Java heap space
提问by wdz
I am using Spark Standalone single machine, with 128G memory and 32 cores. The following are settings I think relevant to my problem:
我用的是Spark Standalone单机,128G内存,32核。以下是我认为与我的问题相关的设置:
spark.storage.memoryFraction 0.35
spark.default.parallelism 50
spark.sql.shuffle.partitions 50
I have a Spark application in which there is a loop for 1000 devices. With each loop (device) it prepares feature vector and then calls k-Means of MLLib. At 25th to 30th iteration of loop (processing 25th to 30th device), it runs into the error of "Java.lang.OutOfMemoryError: Java heap space".
我有一个 Spark 应用程序,其中有 1000 个设备的循环。对于每个循环(设备),它准备特征向量,然后调用 MLLib 的 k-Means。在循环的第 25 到 30 次迭代(处理第 25 到 30 个设备)时,它遇到了“Java.lang.OutOfMemoryError: Java heap space”的错误。
I tried memoryFraction from 0.7 to 0.35, but it didn't help. I also tried parallelism/partitions to 200 with no luck. The JVM option is "-Xms25G -Xmx25G -XX:MaxPermSize=512m". My data size is only about 2G.
我尝试了从 0.7 到 0.35 的 memoryFraction,但没有帮助。我也尝试了 200 的并行/分区,但没有运气。JVM 选项是“-Xms25G -Xmx25G -XX:MaxPermSize=512m”。我的数据大小只有2G左右。
Here is stack trace:
这是堆栈跟踪:
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2271)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:118)
at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at java.io.ObjectOutputStream$BlockDataOutputStream.write(ObjectOutputStream.java:1841)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at scala.collection.mutable.HashMap$$anonfun$writeObject.apply(HashMap.scala:138)
at scala.collection.mutable.HashMap$$anonfun$writeObject.apply(HashMap.scala:136)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashTable$class.serializeTo(HashTable.scala:125)
at scala.collection.mutable.HashMap.serializeTo(HashMap.scala:40)
at scala.collection.mutable.HashMap.writeObject(HashMap.scala:136)
at sun.reflect.GeneratedMethodAccessor116.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:988)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1495)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1547)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1508)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1431)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1177)
At the beginning, the application looks fine, but after it runs for a while and processes more and more devices, Java heap is occupied gradually and memory is not released by JVM. How to diagnose and fix such problem?
刚开始的时候,应用程序看起来还不错,但是运行一段时间后,处理的设备越来越多,Java heap 逐渐被占用,JVM 没有释放内存。如何诊断和解决此类问题?
回答by Sumit
Apart from Driver and Executor memory, would suggest to try following options: -
除了驱动程序和执行程序内存,建议尝试以下选项:-
- Switch to Kryo Serialization - http://spark.apache.org/docs/latest/tuning.html#data-serialization
- Use MEMORY_AND_DISK_SER_2 for RDD persistence.
- 切换到 Kryo 序列化 - http://spark.apache.org/docs/latest/tuning.html#data-serialization
- 使用 MEMORY_AND_DISK_SER_2 进行 RDD 持久化。
Also, would be good if you can post the code.
另外,如果能贴出代码就好了。
回答by R Sawant
You can always use profiler tools like visualVM. to monitor memory growth. Hopefully you are using 64 bit JVM and not 32 bit JVM. 32 bit process can use only use 2GB memory, so the memory setting essentially will be of no use. Hope this helps
您始终可以使用诸如visualVM 之类的分析器工具。监控内存增长。希望您使用的是 64 位 JVM 而不是 32 位 JVM。32位进程只能使用2GB内存,所以内存设置基本上没用。希望这可以帮助
回答by mehmetminanc
JVM options are not sufficient for configuring Spark memory, you also need to set spark.driver.memory
(for driver, obv.) and spark.executor.memory
(for workers). Those are set to 1gb per default. See this thorough guidefor more information. Actually, I urge you to read it, there is a hell lot of stuff there and getting acquainted with it will definitely pay off later on.
JVM 选项不足以配置 Spark 内存,您还需要设置spark.driver.memory
(对于驱动程序,obv.)和spark.executor.memory
(对于工人)。这些默认设置为 1gb。有关更多信息,请参阅此详尽指南。实际上,我敦促您阅读它,那里有很多东西,以后熟悉它肯定会有所收获。