Java 运行Hadoop时如何避免OutOfMemoryException?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3383402/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to avoid OutOfMemoryException when running Hadoop?
提问by wlk
I'm running a Hadoop job over 1,5 TB of data with doing much pattern matching. I have several machines with 16GB RAM each, and I always get OutOfMemoryException
on this job with this data (I'm using Hive).
我正在运行超过 1.5 TB 数据的 Hadoop 作业,并进行了大量模式匹配。我有几台机器,每台机器都有 16GB 内存,我总是OutOfMemoryException
用这些数据来完成这项工作(我使用的是 Hive)。
I would like to know how to optimally set option HADOOP_HEAPSIZE
in file hadoop-env.sh
so, my job would not fail. Is it even possible, to set this option so my jobs won't fail?
我想知道如何HADOOP_HEAPSIZE
在文件中优化设置选项,hadoop-env.sh
这样我的工作就不会失败。甚至有可能设置此选项以使我的工作不会失败吗?
When I set HADOOP_HEAPSIZE
to 1,5 GB and removed half of pattern matching from query, job run successfully. So what is this option for, if it doesn't help avoiding job failures?
当我设置HADOOP_HEAPSIZE
为 1.5 GB 并从查询中删除一半模式匹配时,作业运行成功。那么这个选项有什么用,如果它不能帮助避免工作失败呢?
I ment to do more experimenting with optimal setup, but since those jobs take >10hr to run, I'm asking for your advice.
我需要对最佳设置进行更多试验,但由于这些作业需要 10 小时以上才能运行,因此我在征求您的建议。
采纳答案by Joe Stein
Is the Job failing or is your server crashing? If your Job is failing because of OutOfMemmory on nodes you can tweek your number of max maps and reducers and the JVM opts for each so that will never happen. mapred.child.java.opts (the default is 200Xmx) usually has to be increased based on your data nodes specific hardware.
作业失败了还是您的服务器崩溃了?如果您的作业由于节点上的 OutOfMemmory 而失败,您可以调整最大映射和减速器的数量,JVM 会选择每个,这样就永远不会发生。mapred.child.java.opts(默认值为 200Xmx)通常必须根据您的数据节点特定硬件来增加。
http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/
http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/
Max tasks can be setup on the Namenode or overridden (and set final) on data nodes that may have different hardware configurations. The max tasks are setup for both mappers and reducers. To calculate this it is based on CPU (cores) and the amount of RAM you have and also the JVM max you setup in mapred.child.java.opts (the default is 200). The Datanode and Tasktracker each are set to 1GB so for a 8GB machine the mapred.tasktracker.map.tasks.maximum could be set to 7 and the mapred.tasktracker.reduce.tasks.maximum set to 7 with the mapred.child.java.opts set to -400Xmx (assuming 8 cores). Please note these task maxes are as much done by your CPU if you only have 1 CPU with 1 core then it is time to get new hardware for your data node or set the mask tasks to 1. If you have 1 CPU with 4 cores then setting map to 3 and reduce to 3 would be good (saving 1 core for the daemon).
Max tasks 可以在 Namenode 上设置或在可能具有不同硬件配置的数据节点上覆盖(并设置为 final)。最大任务是为映射器和减速器设置的。要计算它,它基于 CPU(核心)和您拥有的 RAM 量以及您在 mapred.child.java.opts 中设置的 JVM 最大值(默认值为 200)。Datanode 和 Tasktracker 每个都设置为 1GB,因此对于 8GB 的机器,可以将 mapred.tasktracker.map.tasks.maximum 设置为 7,将 mapred.tasktracker.reduce.tasks.maximum 设置为 7,使用 mapred.child.java .opts 设置为 -400Xmx(假设有 8 个内核)。请注意,如果您只有 1 个 CPU 和 1 个核心,那么这些任务最大值由您的 CPU 完成,那么是时候为您的数据节点获取新硬件或将掩码任务设置为 1。
By default there is only one reducer and you need to configure mapred.reduce.tasks to be more than one. This value should be somewhere between .95 and 1.75 times the number of maximum tasks per node times the number of data nodes. So if you have 3 data nodes and it is setup max tasks of 7 then configure this between 25 and 36.
默认情况下只有一个reducer,您需要将mapred.reduce.tasks 配置为多个。该值应介于每个节点的最大任务数乘以数据节点数的 0.95 到 1.75 倍之间。因此,如果您有 3 个数据节点,并且设置的最大任务数为 7,则将其配置在 25 到 36 之间。
If your server is crashing with OutOfMemory issues then that is where the HADOOP_HEAPSIZE comes in just for the processes heap (not the execution of task).
如果您的服务器因 OutOfMemory 问题而崩溃,那么这就是 HADOOP_HEAPSIZE 仅用于进程堆(而不是任务的执行)的地方。
Lastly, if your Job is taking that long you can check to see if you have another good configuration addition is mapred.compress.map.output. Setting this value to true should (balance between the time to compress vs transfer) speed up the reducers copy greatly especially when working with large data sets. Often jobs do take time but there are also options to tweak to help speed things up =8^)
最后,如果您的工作需要那么长时间,您可以检查是否有另一个好的配置添加是 mapred.compress.map.output。将此值设置为 true 应该(在压缩时间与传输时间之间取得平衡)可以大大加快减速器复制的速度,尤其是在处理大型数据集时。通常工作确实需要时间,但也有一些选项可以调整以帮助加快速度 =8^)