dd:如何计算最佳块大小?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6161823/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 00:58:42  来源:igfitidea点击:

dd: How to calculate optimal blocksize?

linuxdd

提问by eckza

How do you calculate the optimal blocksize when running a dd? I've researched it a bit and I've not found anything suggesting how this would be accomplished.

运行 a 时如何计算最佳块大小dd?我对它进行了一些研究,但没有发现任何建议如何实现这一点。

I am under the impression that a larger blocksize would result in a quicker dd... is this true?

我的印象是更大的块大小会导致更快dd......这是真的吗?

I'm about to ddtwo identical 500gb Hitachi HDDs that run at 7200rpm on a box running an Intel Core i3 with 4GB DDR3 1333mhz RAM, so I'm trying to figure out what blocksize to use. (I'm going to be booting Ubuntu 10.10 x86 from a flash drive, and running it from that.)

我将要dd在运行 Intel Core i3 和 4GB DDR3 1333mhz RAM 的机器上以 7200rpm 的速度运行两个相同的 500gb Hitachi HDD,所以我试图找出要使用的块大小。(我将从闪存驱动器启动 Ubuntu 10.10 x86,然后运行它。)

采纳答案by eckza

The optimal block size depends on various factors, including the operating system (and its version), and the various hardware buses and disks involved. Several Unix-like systems (including Linux and at least some flavors of BSD) define the st_blksizemember in the struct statthat gives what the kernel thinks is the optimal block size:

最佳块大小取决于各种因素,包括操作系统(及其版本)以及涉及的各种硬件总线和磁盘。一些类Unix系统(包括Linux和至少一些BSD的味道)定义st_blksize的成员struct stat,让内核认为什么是最佳块大小:

#include <sys/stat.h>
#include <stdio.h>

int main(void)
{
    struct stat stats;

    if (!stat("/", &stats))
    {
        printf("%u\n", stats.st_blksize);
    }
}

The best way may be to experiment: copy a gigabyte with various block sizes and time that. (Remember to clear kernel buffer caches before each run: echo 3 > /proc/sys/vm/drop_caches).

最好的方法可能是进行试验:复制具有各种块大小和时间的 GB。(记住在每次运行之前清除内核缓冲区缓存:)echo 3 > /proc/sys/vm/drop_caches

However, as a rule of thumb, I've found that a large enough block size lets dddo a good job, and the differences between, say, 64 KiB and 1 MiB are minor, compared to 4 KiB versus 64 KiB. (Though, admittedly, it's been a while since I did that. I use a mebibyte by default now, or just let ddpick the size.)

但是,根据经验,我发现足够大的块大小可以dd很好地完成工作,并且与 4 KiB 和 64 KiB 相比,64 KiB 和 1 MiB 之间的差异很小。(虽然,不可否认,我已经有一段时间没有这样做了。我现在默认使用 mebibyte,或者只是dd选择大小。)

回答by ssapkota

This is totally system dependent. You should experiment to find the optimum solution. Try starting with bs=8388608. (As Hitachi HDDs seems to have 8MB cache.)

这完全取决于系统。您应该尝试找到最佳解决方案。尝试从bs=8388608. (因为 Hitachi HDD 似乎有 8MB 缓存。)

回答by eadmaster

  • for better performace use the biggest blocksize you RAM can accomodate (will send less I/O calls to the OS)
  • for better accurancy and data recovery set the blocksize to the native sector size of the input
  • 为了获得更好的性能,请使用 RAM 可以容纳的最大块大小(将向操作系统发送更少的 I/O 调用)
  • 为了更好的准确性和数据恢复,将块大小设置为输入的本机扇区大小

As dd copies data with the conv=noerror,sync option, any errors it encounters will result in the remainder of the block being replaced with zero-bytes. Larger block sizes will copy more quickly, but each time an error is encountered the remainder of the block is ignored.

当 dd 使用 conv=noerror,sync 选项复制数据时,它遇到的任何错误都将导致块的其余部分被零字节替换。更大的块大小将更快地复制,但每次遇到错误时,块的其余部分都会被忽略。

source

来源

回答by unfa

I've found my optimal blocksize to be 8 MB (equal to disk cache?) I needed to wipe (some say: wash) the empty space on a disk before creating a compressed image of it. I used:

我发现我的最佳块大小是 8 MB(等于磁盘缓存?)我需要在创建磁盘的压缩映像之前擦除(有人说:清洗)磁盘上的空白空间。我用了:

cd /media/DiskToWash/
dd if=/dev/zero of=zero bs=8M; rm zero

I experimented with values from 4K to 100M.

我尝试了从 4K 到 100M 的值。

After letting dd to run for a while I killed it (Ctlr+C) and read the output:

让 dd 运行一段时间后,我杀死了它(Ctlr+C)并读取输出:

36+0 records in
36+0 records out
301989888 bytes (302 MB) copied, 15.8341 s, 19.1 MB/s

As dd displays the input/output rate (19.1MB/s in this case) it's easy to see if the value you've picked is performing better than the previous one or worse.

由于 dd 显示输入/输出速率(在本例中为 19.1MB/s),很容易看出您选择的值是比前一个更好还是更差。

My scores:

我的成绩:

bs=   I/O rate
---------------
4K    13.5 MB/s
64K   18.3 MB/s
8M    19.1 MB/s <--- winner!
10M   19.0 MB/s
20M   18.6 MB/s
100M  18.6 MB/s   

Note: To check what your disk cache/buffer size is, you can use sudo hdparm -i /dev/sda

注意:要检查您的磁盘缓存/缓冲区大小,您可以使用 sudo hdparm -i /dev/sda

回答by tdg5

As others have said, there is no universally correct block size; what is optimal for one situation or one piece of hardware may be terribly inefficient for another. Also, depending on the health of the disks it may be preferable to use a different block size than what is "optimal".

正如其他人所说,没有普遍正确的块大小;对于一种情况或一种硬件来说是最佳的,对于另一种情况来说可能非常低效。此外,根据磁盘的健康状况,最好使用与“最佳”不同的块大小。

One thing that is pretty reliable on modern hardware is that the default block size of 512 bytes tends to be almost an order of magnitude slower than a more optimal alternative. When in doubt, I've found that 64K is a pretty solid modern default. Though 64K usually isn't THE optimal block size, in my experience it tends to be a lot more efficient than the default. 64K also has a pretty solid history of being reliably performant: You can find a message from the Eug-Lug mailing list, circa 2002, recommending a block size of 64K here: http://www.mail-archive.com/[email protected]/msg12073.html

在现代硬件上非常可靠的一件事是,512 字节的默认块大小往往比更优化的替代方案慢几乎一个数量级。如有疑问,我发现 64K 是一个非常可靠的现代默认值。虽然 64K 通常不是最佳的块大小,但根据我的经验,它往往比默认值高效得多。64K 在性能可靠方面也有着相当稳固的历史:您可以在大约 2002 年的 Eug-Lug 邮件列表中找到一条消息,建议在此处使用 64K 的块大小:http: //www.mail-archive.com/eug- [email protected]/msg12073.html

For determining THE optimal output block size, I've written the following script that tests writing a 128M test file with dd at a range of different block sizes, from the default of 512 bytes to a maximum of 64M. Be warned, this script uses dd internally, so use with caution.

为了确定最佳输出块大小,我编写了以下脚本来测试使用 dd 在一系列不同块大小(从默认值 512 字节到最大 64M)编写 128M 测试文件。请注意,此脚本在内部使用 dd,因此请谨慎使用。

dd_obs_test.sh:

dd_obs_test.sh:

#!/bin/bash

# Since we're dealing with dd, abort if any errors occur
set -e

TEST_FILE=${1:-dd_obs_testfile}
TEST_FILE_EXISTS=0
if [ -e "$TEST_FILE" ]; then TEST_FILE_EXISTS=1; fi
TEST_FILE_SIZE=134217728

if [ $EUID -ne 0 ]; then
  echo "NOTE: Kernel cache will not be cleared between tests without sudo. This will likely cause inaccurate results." 1>&2
fi

# Header
PRINTF_FORMAT="%8s : %s\n"
printf "$PRINTF_FORMAT" 'block size' 'transfer rate'

# Block sizes of 512b 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M 4M 8M 16M 32M 64M
for BLOCK_SIZE in 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 67108864
do
  # Calculate number of segments required to copy
  COUNT=$(($TEST_FILE_SIZE / $BLOCK_SIZE))

  if [ $COUNT -le 0 ]; then
    echo "Block size of $BLOCK_SIZE estimated to require $COUNT blocks, aborting further tests."
    break
  fi

  # Clear kernel cache to ensure more accurate test
  [ $EUID -eq 0 ] && [ -e /proc/sys/vm/drop_caches ] && echo 3 > /proc/sys/vm/drop_caches

  # Create a test file with the specified block size
  DD_RESULT=$(dd if=/dev/zero of=$TEST_FILE bs=$BLOCK_SIZE count=$COUNT conv=fsync 2>&1 1>/dev/null)

  # Extract the transfer rate from dd's STDERR output
  TRANSFER_RATE=$(echo $DD_RESULT | \grep --only-matching -E '[0-9.]+ ([MGk]?B|bytes)/s(ec)?')

  # Clean up the test file if we created one
  if [ $TEST_FILE_EXISTS -ne 0 ]; then rm $TEST_FILE; fi

  # Output the result
  printf "$PRINTF_FORMAT" "$BLOCK_SIZE" "$TRANSFER_RATE"
done

View on GitHub

在 GitHub 上查看

I've only tested this script on a Debian (Ubuntu) system and on OSX Yosemite, so it will probably take some tweaking to make work on other Unix flavors.

我只在 Debian (Ubuntu) 系统和 OSX Yosemite 上测试了这个脚本,所以它可能需要一些调整才能在其他 Unix 风格上工作。

By default the command will create a test file named dd_obs_testfilein the current directory. Alternatively, you can provide a path to a custom test file by providing a path after the script name:

默认情况下,该命令将在当前目录中创建一个名为dd_obs_testfile的测试文件。或者,您可以通过在脚本名称后提供路径来提供自定义测试文件的路径:

$ ./dd_obs_test.sh /path/to/disk/test_file

The output of the script is a list of the tested block sizes and their respective transfer rates like so:

脚本的输出是测试块大小及其各自传输速率的列表,如下所示:

$ ./dd_obs_test.sh
block size : transfer rate
       512 : 11.3 MB/s
      1024 : 22.1 MB/s
      2048 : 42.3 MB/s
      4096 : 75.2 MB/s
      8192 : 90.7 MB/s
     16384 : 101 MB/s
     32768 : 104 MB/s
     65536 : 108 MB/s
    131072 : 113 MB/s
    262144 : 112 MB/s
    524288 : 133 MB/s
   1048576 : 125 MB/s
   2097152 : 113 MB/s
   4194304 : 106 MB/s
   8388608 : 107 MB/s
  16777216 : 110 MB/s
  33554432 : 119 MB/s
  67108864 : 134 MB/s

(Note: The unit of the transfer rates will vary by OS)

(注:传输速率的单位会因操作系统而异)

To test optimal read block size, you could use more or less the same process, but instead of reading from /dev/zero and writing to the disk, you'd read from the disk and write to /dev/null. A script to do this might look like so:

要测试最佳读取块大小,您可以使用或多或少相同的过程,但不是从 /dev/zero 读取并写入磁盘,而是从磁盘读取并写入 /dev/null。执行此操作的脚本可能如下所示:

dd_ibs_test.sh:

dd_ibs_test.sh:

#!/bin/bash

# Since we're dealing with dd, abort if any errors occur
set -e

TEST_FILE=${1:-dd_ibs_testfile}
if [ -e "$TEST_FILE" ]; then TEST_FILE_EXISTS=$?; fi
TEST_FILE_SIZE=134217728

# Exit if file exists
if [ -e $TEST_FILE ]; then
  echo "Test file $TEST_FILE exists, aborting."
  exit 1
fi
TEST_FILE_EXISTS=1

if [ $EUID -ne 0 ]; then
  echo "NOTE: Kernel cache will not be cleared between tests without sudo. This will likely cause inaccurate results." 1>&2
fi

# Create test file
echo 'Generating test file...'
BLOCK_SIZE=65536
COUNT=$(($TEST_FILE_SIZE / $BLOCK_SIZE))
dd if=/dev/urandom of=$TEST_FILE bs=$BLOCK_SIZE count=$COUNT conv=fsync > /dev/null 2>&1

# Header
PRINTF_FORMAT="%8s : %s\n"
printf "$PRINTF_FORMAT" 'block size' 'transfer rate'

# Block sizes of 512b 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M 4M 8M 16M 32M 64M
for BLOCK_SIZE in 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2097152 4194304 8388608 16777216 33554432 67108864
do
  # Clear kernel cache to ensure more accurate test
  [ $EUID -eq 0 ] && [ -e /proc/sys/vm/drop_caches ] && echo 3 > /proc/sys/vm/drop_caches

  # Read test file out to /dev/null with specified block size
  DD_RESULT=$(dd if=$TEST_FILE of=/dev/null bs=$BLOCK_SIZE 2>&1 1>/dev/null)

  # Extract transfer rate
  TRANSFER_RATE=$(echo $DD_RESULT | \grep --only-matching -E '[0-9.]+ ([MGk]?B|bytes)/s(ec)?')

  printf "$PRINTF_FORMAT" "$BLOCK_SIZE" "$TRANSFER_RATE"
done

# Clean up the test file if we created one
if [ $TEST_FILE_EXISTS -ne 0 ]; then rm $TEST_FILE; fi

View on GitHub

在 GitHub 上查看

An important difference in this case is that the test file is a file that is written by the script. Do not point this command at an existing file or the existing file will be overwritten with zeroes!

这种情况下的一个重要区别是测试文件是由脚本编写的文件。不要将此命令指向现有文件,否则现有文件将被零覆盖!

For my particular hardware I found that 128K was the most optimal input block size on a HDD and 32K was most optimal on a SSD.

对于我的特定硬件,我发现 128K 是 HDD 上的最佳输入块大小,而 32K 是 SSD 上的最佳输入块大小。

Though this answer covers most of my findings, I've run into this situation enough times that I wrote a blog post about it: http://blog.tdg5.com/tuning-dd-block-size/You can find more specifics on the tests I performed there.

虽然这个答案涵盖了我的大部分发现,但我已经多次遇到这种情况,所以我写了一篇关于它的博客文章:http: //blog.tdg5.com/tuning-dd-block-size/你可以找到更多细节我在那里进行的测试。