scala 火花错误：executor.CoarseGrainedExecutorBackend：收到信号条款

Question

提问by Vishal

I am working with following spark config

我正在使用以下火花配置

maxCores = 5
 driverMemory=2g
 executorMemory=17g
 executorInstances=100

Issue: Out of 100 Executors, My job ends up with only 10 active executors, nonetheless enough memory is available. Even tried setting the executors to 250 only 10 remains active.All I am trying to do is loading a mulitpartition hive table and doing df.count over it.

问题：在 100 个 Executor 中，我的工作最终只有 10 个活动的 executor，但仍有足够的可用内存。即使尝试将执行程序设置为 250，只有 10 个仍然处于活动状态。我要做的就是加载一个多分区配置单元表并对其执行 df.count。

Please help me understanding the issue causing the executors kill
17/12/20 11:08:21 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM
17/12/20 11:08:21 INFO storage.DiskBlockManager: Shutdown hook called
17/12/20 11:08:21 INFO util.ShutdownHookManager: Shutdown hook called

Not sure why yarn is killing my executors.

不知道为什么纱线会杀死我的执行者。

Answer 1

回答by maffe

I faced a similar issue where the investigation of the NodeManager-Logs lead me to the root cause. You can access them via the Web-interface

我遇到了一个类似的问题，对 NodeManager-Logs 的调查使我找到了根本原因。您可以通过 Web 界面访问它们

nodeManagerAddress:PORT/logs

The PORTis specified in the yarn-site.xmlunder yarn.nodemanager.webapp.address. (default: 8042)

该端口在指定纱的site.xml下yarn.nodemanager.webapp.address。（默认：8042）

My Investigation-Workflow:

我的调查工作流程：

Collect logs (yarn logs ... command)
Identify node and container (in these logs) emitting the error
Search the NodeManager-logs by Timestamp of the errorfor a root cause

收集日志（纱线日志...命令）
识别发出错误的节点和容器（在这些日志中）
按错误的时间戳在 NodeManager 日志中搜索根本原因

Btw: you can access the aggregated collection (xml) of all configurations affecting a node at the same port with:

顺便说一句：您可以通过以下方式访问影响同一端口节点的所有配置的聚合集合 (xml)：

 nodeManagerAdress:PORT/conf

Answer 2

回答by JumpMan

I believe this issue has more to do with the memory and the dynamic time allocations on executor/container levels. Make sure you can change the config params on executor/container level.

我相信这个问题更多地与执行程序/容器级别的内存和动态时间分配有关。确保您可以在执行程序/容器级别更改配置参数。

One of the ways you can resolve this issue is by changing this config value either on your spark-shell or spark job.

解决此问题的方法之一是在 spark-shell 或 spark 作业上更改此配置值。

spark.dynamicAllocation.executorIdleTimeout

This thread has more detailed information on how to resolve this issue which worked for me: https://jira.apache.org/jira/browse/SPARK-21733

这个线程有关于如何解决这个对我有用的问题的更详细的信息：https: //jira.apache.org/jira/browse/SPARK-21733

scala 火花错误：executor.CoarseGrainedExecutorBackend：收到信号条款

提问by Vishal

回答by maffe

回答by JumpMan

相关推荐

最近更新

标签

scala 火花错误：executor.CoarseGrainedExecutorBackend：收到信号条款

提问by Vishal

回答by maffe

回答by JumpMan

相关推荐

scala Scala通过表达式向数据框添加新列

scala 获取数组列的大小/长度

scala 将火花数据帧转换为数组 [String]

将 Row 转换为 spark Scala 中的映射

相关推荐

最近更新

标签