Java Spark 驱动程序内存和执行程序内存

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41645679/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 23:32:50  来源:igfitidea点击:

Spark Driver Memory and Executor Memory

javaapache-sparkspark-streamingspark-submit

提问by nnc

I am beginner to Spark and I am running my application to read 14KB data from text filed, do some transformations and actions(collect, collectAsMap) and save data to Database

我是 Spark 的初学者,我正在运行我的应用程序以从文本文件中读取 14KB 数据,执行一些转换和操作(collect、collectAsMap)并将数据保存到数据库

I am running it locally in my macbook with 16G memory, with 8 logical cores.

我在我的 macbook 本地运行它,内存为 16G,有 8 个逻辑内核。

Java Max heap is set at 12G.

Java Max 堆设置为 12G。

Here is the command I use to run the application.

这是我用来运行应用程序的命令。

bin/spark-submit --class com.myapp.application --master local[*] --executor-memory 2G --driver-memory 4G /jars/application.jar

bin/spark-submit --class com.myapp.application --master local[*] --executor-memory 2G --driver-memory 4G /jars/application.jar

I am getting the following warning

我收到以下警告

2017-01-13 16:57:31.579 [Executor task launch worker-8hread] WARN org.apache.spark.storage.MemoryStore - Not enough space to cache rdd_57_0 in memory! (computed 26.4 MB so far)

2017-01-13 16:57:31.579 [Executor task launch worker-8hread] WARN org.apache.spark.storage.MemoryStore - 没有足够的空间在内存中缓存 rdd_57_0!(到目前为止计算为 26.4 MB)

Can anyone guide me on what is going wrong here and how can I improve performance? Also how to optimize on suffle-spill ? Here is a view of the spill that happens in my local system

任何人都可以指导我这里出了什么问题以及如何提高性能?还有如何优化 suffle-spill ?这是我本地系统中发生的溢出的视图

enter image description here

在此处输入图片说明

回答by Wang

In local mode,you don't need to specify master,useing default arguments is ok. The official website said,"The spark-submit script in Spark's bin directory is used to launch applications on a cluster. It can use all of Spark's supported cluster managers through a uniform interface so you don't have to configure your application specially for each one.".So you'd better use spark-submit in cluster,locally you can use spark-shell.

在本地模式下,您不需要指定 master,使用默认参数就可以了。官网说,“Spark bin目录下的spark-submit脚本用于在集群上启动应用程序。它可以通过统一的接口使用Spark支持的所有集群管理器,所以你不必为每个应用程序专门配置你的应用程序。”一个。”。所以你最好在集群中使用spark-submit,在本地你可以使用spark-shell。

回答by Sandeep Singh

Running executors with too much memory often results in excessive garbage collection delays. So it is not a good idea to assign more memory. Since you have only 14KB data 2GB executors memory and 4GB driver memory is more than enough. There is no use of assigning this much memory. You can run this job with even 100MB memory and performance will be better then 2GB.

运行内存过多的执行程序通常会导致垃圾收集延迟过多。所以分配更多的内存不是一个好主意。由于您只有 14KB 数据,因此 2GB 执行程序内存和 4GB 驱动程序内存绰绰有余。分配这么多内存是没有用的。您甚至可以使用 100MB 内存运行此作业,性能会比 2GB 更好。

Driver memory are more useful when you run the application, In yarn-cluster mode, because the application master runs the driver. Here you are running your application in local mode driver-memoryis not necessary. You can remove this configuration from you job.

在运行应用程序时,驱动程序内存更有用,在纱线集群模式下,因为应用程序主运行驱动程序。在这里,您driver-memory不需要在本地模式下运行您的应用程序。您可以从您的作业中删除此配置。

In your application you have assigned

在您的申请中,您已分配

Java Max heap is set at: 12G.
executor-memory: 2G 
driver-memory: 4G

Total memory allotment= 16GB and your macbook having 16GB only memory. Here you have allocated total of your RAM memory to your spark application.

总内存分配 = 16GB 并且您的 macbook 仅具有 16GB 内存。在这里,您已将全部 RAM 内存分配给您的 Spark 应用程序。

This is not good. Operating system itself consume approx 1GB memory and you might have running other applications which also consume the RAM memory. So here you are actually allocating more memory then you have. And this is the root cause that your application is throwing error Not enough space to cache the RDD

这是不好的。操作系统本身消耗大约 1GB 内存,您可能正在运行其他也消耗 RAM 内存的应用程序。所以在这里你实际上分配了更多的内存。这是您的应用程序抛出错误的根本原因Not enough space to cache the RDD

  1. There is no use of assigning Java Heap to 12 GB. You need to reduce it to 4GB or less.
  2. Reduce the executor memory to executor-memory 1Gor less
  3. Since you are running locally, Remove driver-memoryfrom your configuration.
  1. 将 Java 堆分配给 12 GB 是没有用的。您需要将其减少到 4GB 或更少。
  2. 将执行程序内存减少到executor-memory 1G或更少
  3. 由于您在本地运行,因此请driver-memory从您的配置中删除。

Submit your job. It will run smoothly.

提交您的工作。它会顺利运行。

If you are very keen to know spark memory management techniques, refer this useful article.

如果您非常想了解 Spark 内存管理技术,请参阅这篇有用的文章。

Spark on yarn executor resource allocation

纱线执行器资源分配上的火花