Java “堆大小”对 Hadoop Namenode 意味着什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22215994/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 14:22:42  来源:igfitidea点击:

What does "Heap Size" mean for Hadoop Namenode?

javahadoopmapreduceheap-memory

提问by Bohdan

I'm trying to understand if there is something wrong with my Hadoop cluster. When I go to web UI in cluster summary it says:

我试图了解我的 Hadoop 集群是否有问题。当我在集群摘要中转到 Web UI 时,它说:

Cluster Summary

XXXXXXX files and directories, XXXXXX blocks = 7534776 total.
Heap Size is 1.95 GB / 1.95 GB (100%) 

And I'm concerned about why is this Heap size metric at 100%

我担心为什么这个堆大小指标是 100%

Could someone please provide some explanation how namenode heap size impact cluster performance. And whether this needs to be fixed.

有人可以解释一下 namenode 堆大小如何影响集群性能。以及这是否需要修复。

回答by Remus Rusanu

The namenode Web UI shows the values as this:

namenode Web UI 显示的值如下:

<h2>Cluster Summary (Heap Size is <%= StringUtils.byteDesc(Runtime.getRuntime().totalMemory()) %>/<%= StringUtils.byteDesc(Runtime.getRuntime().maxMemory()) %>)</h2>

The Runtimedocuments these as:

这些Runtime文件如下:

  • totalMemory()Returns the total amount of memory in the Java virtual machine.
  • maxMemory()Returns the maximum amount of memory that the Java virtual machine will attempt to use
  • totalMemory()返回 Java 虚拟机中的总内存量。
  • maxMemory()返回 Java 虚拟机将尝试使用的最大内存量

Max is going to be the -Xmxparameter from the service start command. The total memory main factor is the number of blocks in your HDFS cluster. The namenode requires ~150 bytes for each block, +16 bytes for each replica, and it must be kept in live memory. So a default replication factor of 3 gives you 182 bytes, and you have 7534776 blocks gives about 1.3GB. Plus all other non-file related memory in use in the namenode, 1.95GB sounds about right. I would say that your HDFS cluster size requires a bigger namenode, more RAM. If possible, increase namenode startup -Xmx. If maxed out, you'll need a bigger VM/physical box.

Max 将成为-Xmxservice start 命令的参数。总内存主要因素是 HDFS 集群中的块数。namenode 每个块需要大约 150 个字节,每个副本需要 +16 个字节,并且必须保存在实时内存中。因此,默认复制因子 3 为您提供 182 个字节,您有 7534776 个块提供大约 1.3GB。加上 namenode 中使用的所有其他与文件无关的内存,1.95GB 听起来很合适。我会说您的 HDFS 集群大小需要更大的名称节点和更多的 RAM。如果可能,增加 namenode startup -Xmx。如果达到最大值,您将需要一个更大的虚拟机/物理机。

Read The Small Files Problesm, HDFS-5711.

阅读小文件问题HDFS-5711