database HDFS 中的数据块大小,为什么是 64MB?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19473772/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
data block size in HDFS, why 64MB?
提问by dykw
The default data block size of HDFS/hadoop is 64MB. The block size in disk is generally 4KB. What does 64MB block size mean? ->Does it mean that the smallest unit of read from disk is 64MB?
HDFS/hadoop 的默认数据块大小为 64MB。磁盘中的块大小一般为 4KB。64MB 块大小是什么意思?->这是否意味着从磁盘读取的最小单位是64MB?
If yes, what is the advantage of doing that?-> easy for continuous access of large file in HDFS?
如果是,那么这样做有什么好处?-> 易于在 HDFS 中连续访问大文件?
Can we do the same by using the original 4KB block size in disk?
我们可以通过使用磁盘中原始的 4KB 块大小来做同样的事情吗?
回答by bstempi
What does 64MB block size mean?
The block size is the smallest unit of data that a file system can store. If you store a file that's 1k or 60Mb, it'll take up one block. Once you cross the 64Mb boundry, you need a second block.
块大小是文件系统可以存储的最小数据单位。如果您存储 1k 或 60Mb 的文件,它将占用一个块。一旦跨越 64Mb 边界,您就需要第二个块。
If yes, what is the advantage of doing that?
HDFS is meant to handle large files. Lets say you have a 1000Mb file. With a 4k block size, you'd have to make 256,000 requests to get that file (1 request per block). In HDFS, those requests go across a network and come with a lot of overhead. Each request has to be processed by the Name Node to figure out where that block can be found. That's a lot of traffic! If you use 64Mb blocks, the number of requests goes down to 16, greatly reducing the cost of overhead and load on the Name Node.
HDFS 旨在处理大文件。假设您有一个 1000Mb 的文件。对于 4k 块大小,您必须发出 256,000 个请求才能获取该文件(每个块 1 个请求)。在 HDFS 中,这些请求会通过网络并带来大量开销。每个请求都必须由 Name Node 处理以找出可以找到该块的位置。那是很多流量!如果使用 64Mb 块,请求数量将下降到 16 个,从而大大降低了 Name Node 上的开销和负载成本。
回答by cabad
HDFS's design was originally inspired by the design of the Google File System (GFS). Here are the two reasons for large block sizes as stated in the original GFS paper (note 1 on GFS terminology vs HDFS terminology: chunk = block, chunkserver = datanode, master = namenode; note 2: bold formatting is mine):
HDFS 的设计最初受到 Google 文件系统 (GFS) 设计的启发。以下是原始 GFS 论文中所述的大块大小的两个原因(关于 GFS 术语与 HDFS 术语的注释 1:chunk = 块,chunkserver = datanode,master = namenode;注释 2:粗体格式是我的):
A large chunk size offers several important advantages. First, it reduces clients' need to interact with the master because reads and writes on the same chunk require only one initial request to the master for chunk location information. The reduction is especially significant for our workloads because applications mostly read and write large files sequentially. [...] Second, since on a large chunk, a client is more likely to perform many operations on a given chunk, it can reduce network overhead by keeping a persistent TCP connection to the chunkserver over an extended period of time. Third, it reduces the size of the metadata stored on the master. This allows us to keep the metadata in memory, which in turn brings other advantages that we will discuss in Section 2.6.1.
大块大小提供了几个重要的优势。首先,它减少了客户端与 master 交互的需要,因为对同一块的读取和写入只需要向 master 发出一次初始请求以获取块位置信息。这种减少对我们的工作负载尤其重要,因为应用程序大多按顺序读写大文件。[...]其次,由于在大块上,客户端更有可能对给定块执行许多操作,因此它可以通过在较长时间内保持与块服务器的持久 TCP 连接来减少网络开销。第三,它减少了存储在 master 上的元数据的大小。这允许我们将元数据保存在内存中,这反过来又带来了我们将在第 2.6.1 节中讨论的其他优势。
Finally, I should point out that the current default size in Apache Hadoopis is 128 MB.
最后,我应该指出Apache Hadoop 中当前的默认大小是 128 MB。
回答by kosii
In HDFS the block size controls the level of replication declustering. The lower the block size your blocks are more evenly distributed across the DataNodes. The higher the block size your data are potentially less equally distributed in your cluster.
在 HDFS 中,块大小控制复制分簇的级别。块大小越小,您的块在 DataNode 上的分布就越均匀。块大小越大,您的数据在集群中的分布可能越不均匀。
So what's the point then choosing a higher block size instead of some low value? While in theory equal distribution of data is a good thing, having a too low blocksize has some significant drawbacks. NameNode's capacity is limited, so having 4KB blocksize instead of 128MB means also having 32768 times more information to store. MapReduce could also profit from equally distributed data by launching more map tasks on more NodeManager and more CPU cores, but in practice theoretical benefits will be lost on not being able to perform sequential, buffered reads and because of the latency of each map task.
那么选择更高的块大小而不是一些低值有什么意义呢?虽然理论上数据的平均分布是一件好事,但块大小太低有一些明显的缺点。NameNode 的容量是有限的,因此使用 4KB 块大小而不是 128MB 意味着还要存储 32768 倍的信息。MapReduce 还可以通过在更多 NodeManager 和更多 CPU 内核上启动更多映射任务来从均等分布的数据中获益,但在实践中,由于无法执行顺序缓冲读取以及每个映射任务的延迟,理论上的优势将丧失。
回答by Shivakumar
In normal OS block size is 4K and in hadoop it is 64 Mb. Because for easy maintaining of the metadata in Namenode.
在正常的操作系统块大小为 4K,在 hadoop 中为 64 Mb。因为为了方便维护 Namenode 中的元数据。
Suppose we have only 4K of block size in hadoop and we are trying to load 100 MB of data into this 4K then here we need more and more number of 4K blocks required. And namenode need to maintain all these 4K blocks of metadata.
假设我们在 hadoop 中只有 4K 的块大小,并且我们试图将 100 MB 的数据加载到这个 4K 中,那么这里我们需要越来越多的 4K 块。而 namenode 需要维护所有这些 4K 元数据块。
If we use 64MB of block size then data will be load into only two blocks(64MB and 36MB).Hence the size of metadata is decreased.
如果我们使用 64MB 的块大小,那么数据将只加载到两个块(64MB 和 36MB)中。因此元数据的大小会减少。
Conclusion: To reduce the burden on namenode HDFS prefer 64MB or 128MB of block size. The default size of the block is 64MB in Hadoop 1.0 and it is 128MB in Hadoop 2.0.
结论:为了减轻 namenode 的负担,HDFS 更喜欢 64MB 或 128MB 的块大小。块的默认大小在 Hadoop 1.0 中为 64MB,在 Hadoop 2.0 中为 128MB。
回答by dpaluy
- If block size was set to less than 64, there would be a huge number of blocks throughout the cluster, which causes NameNode to manage an enormous amount of metadata.
- Since we need a Mapper for each block, there would be a lot of Mappers, each processing a piece bit of data, which isn't efficient.
- 如果块大小设置为小于 64,整个集群中将会有大量的块,这会导致 NameNode 管理大量的元数据。
- 因为我们每个块都需要一个 Mapper,所以会有很多 Mapper,每个都处理一点数据,效率不高。
回答by steven
The reason Hadoop chose 64MB was because Google chose 64MB. The reason Google chose 64MB was due to a Goldilocks argument.
Hadoop 选择 64MB 的原因是因为 Google 选择了 64MB。Google 选择 64MB 的原因是由于金发姑娘的争论。
Having a much smaller block size would cause seek overhead to increase.
具有更小的块大小会导致寻道开销增加。
Having a moderately smaller block size makes map tasks run fast enough that the cost of scheduling them becomes comparable to the cost of running them.
具有适度较小的块大小使 map 任务运行得足够快,以至于调度它们的成本与运行它们的成本相当。
Having a significantly larger block size begins to decrease the available read parallelism available and may ultimately make it hard to schedule tasks local to the tasks.
具有明显更大的块大小开始减少可用的可用读取并行度,并最终可能难以调度任务本地的任务。
See Google Research Publication: MapReduce http://research.google.com/archive/mapreduce.html
请参阅 Google 研究出版物:MapReduce http://research.google.com/archive/mapreduce.html
回答by Praveen Sripati
It has more to do with disk seeks of the HDD (Hard Disk Drives). Over time the disk seek time had not been progressing much when compared to the disk throughput. So, when the block size is small (which leads to too many blocks) there will be too many disk seeks which is not very efficient. As we make progress from HDD to SDD, the disk seek time doesn't make much sense as they are moving parts in SSD.
它更多地与 HDD(硬盘驱动器)的磁盘寻道有关。随着时间的推移,与磁盘吞吐量相比,磁盘寻道时间并没有太大进展。因此,当块大小较小(导致块过多)时,会出现过多的磁盘寻道,效率不高。随着我们从 HDD 到 SDD,磁盘寻道时间没有多大意义,因为它们是 SSD 中的移动部件。
Also, if there are too many blocks it will strain the Name Node. Note that the Name Node has to store the entire meta data (data about blocks) in the memory. In the Apache Hadoop the default block size is 64 MB and in the Cloudera Hadoop the default is 128 MB.
此外,如果有太多块,它会给名称节点带来压力。请注意,名称节点必须将整个元数据(关于块的数据)存储在内存中。在 Apache Hadoop 中,默认块大小为 64 MB,在 Cloudera Hadoop 中默认为 128 MB。
回答by deepSleep
Below is what the book "Hadoop: The Definitive Guide", 3rd edition explains(p45).
以下是“Hadoop:权威指南”,第 3 版(第 45 页)一书的解释。
Why Is a Block in HDFS So Large?
HDFS blocks are large compared to disk blocks, and the reason is to minimize the cost of seeks. By making a block large enough, the time to transfer the data from the disk can be significantly longer than the time to seek to the start of the block. Thus the time to transfer a large file made of multiple blocks operates at the disk transfer rate.
A quick calculation shows that if the seek time is around 10 ms and the transfer rate is 100 MB/s, to make the seek time 1% of the transfer time, we need to make the block size around 100 MB. The default is actually 64 MB, although many HDFS installations use 128 MB blocks. This figure will continue to be revised upward as transfer speeds grow with new generations of disk drives.
This argument shouldn't be taken too far, however. Map tasks in MapReduce normally operate on one block at a time, so if you have too few tasks (fewer than nodes in the cluster), your jobs will run slower than they could otherwise.
为什么 HDFS 中的块这么大?
与磁盘块相比,HDFS 块很大,原因是为了最小化查找成本。通过使块足够大,从磁盘传输数据的时间可以明显长于寻找块开头的时间。因此,传输由多个块组成的大文件的时间以磁盘传输速率运行。
快速计算一下,如果寻道时间在 10 ms 左右,传输速率为 100 MB/s,为了使寻道时间占传输时间的 1%,我们需要使块大小在 100 MB 左右。默认值实际上是 64 MB,尽管许多 HDFS 安装使用 128 MB 块。随着新一代磁盘驱动器的传输速度提高,该数字将继续向上修正。
然而,这个论点不应该走得太远。MapReduce 中的映射任务通常一次在一个块上运行,因此如果您的任务太少(少于集群中的节点),您的作业将比其他情况运行得更慢。

