windows 提高高速文件复制的写入速度?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2025694/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-15 13:43:12  来源:igfitidea点击:

Improve write speed for high speed file copy?

c++windowscopyraid

提问by ring0

I've been trying to find out the fastest way to code a file copy routine to copy a large file onto a RAID 5 hardware.

我一直在尝试找出编写文件复制例程以将大文件复制到 RAID 5 硬件的最快方法。

The average file size is around 2 GB.

平均文件大小约为 2 GB。

There are 2 windows boxes (both running win2k3). The first box is the source, where is the large file is located. And the second box has a RAID 5 storage.

有 2 个 windows 框(都运行 win2k3)。第一个框是源,大文件所在的位置。第二个盒子有一个 RAID 5 存储。

http://blogs.technet.com/askperf/archive/2007/05/08/slow-large-file-copy-issues.aspx

http://blogs.technet.com/askperf/archive/2007/05/08/slow-large-file-copy-issues.aspx

The above link clearly explains why windows copy, robocopy and other common copy utilities suffer in write performance. Hence, i've written a C/C++ program that uses CreateFile, ReadFile & WriteFile API's with NO_BUFFERING& WRITE_THROUGHflags. The program simulates ESEUTIL.exe, in the sense, it uses 2 threads, one for reading and one for writing. The reader thread reads 256 KB from source and fills a buffer. Once 16 such 256 KB blocks are filled, the writer thread writes the contents in the buffer to the destination file. As you can see, the writer thread writes 8MB of data in 1 shot. The program allocates 32 such 8MB blocks... hence, the writing and reading can happen in parallel. Details of ESEUtil.exe can be found in the above link. Note: I am taking care of the data alignment issues when using NO_BUFFERING.

上面的链接清楚地解释了为什么 windows copy、robocopy 和其他常见的复制实用程序会降低写入性能。因此,我编写了一个 C/C++ 程序,它使用带有NO_BUFFERING&WRITE_THROUGH标志的CreateFile、ReadFile 和 WriteFile API 。该程序模拟 ESEUTIL.exe,从某种意义上说,它使用 2 个线程,一个用于读取,一个用于写入。读取器线程从源读取 256 KB 并填充缓冲区。一旦填充了 16 个这样的 256 KB 块,写入器线程就会将缓冲区中的内容写入目标文件。如您所见,写入器线程一次写入 8MB 数据。程序分配了 32 个这样的 8MB 块……因此,写入和读取可以并行发生。ESEUtil.exe 的详细信息可以在上面的链接中找到。注意:我在使用NO_BUFFERING.

I used bench marking utilities like ATTO and found out that our RAID 5 hardware has a write speed of 44MB per second when writing 8MB data chunk. Which is around 2.57 GB per minute.

我使用了像 ATTO 这样的基准测试工具,发现我们的 RAID 5 硬件在写入 8MB 数据块时的写入速度为每秒 44MB。这大约是每分钟 2.57 GB

But my program is able to achieve only 1.4 GB per minute.

但是我的程序每分钟只能达到1.4 GB。

Can anyone please help me identify what the problem is? Are there faster API's other that CreateFile, ReadFile, WriteFileavailable?

任何人都可以帮我找出问题所在吗?是否有更快的API的其他说CreateFileReadFileWriteFile可用?

回答by John Knoeller

You should use async IO to get the best performance. That is opening the file with FILE_FLAG_OVERLAPPEDand using the LPOVERLAPPEDargument of WriteFile. You may or may not get better performance with FILE_FLAG_NO_BUFFERING. You will have to test to see.

您应该使用异步 IO 以获得最佳性能。那就是FILE_FLAG_OVERLAPPED使用LPOVERLAPPEDWriteFile的参数打开文件。您可能会也可能不会获得更好的性能FILE_FLAG_NO_BUFFERING。你必须测试才能看到。

FILE_FLAG_NO_BUFFERINGwill generally give you more consistent speeds and better streaming behavior, and it avoids polluting your disk cache with data that you may not need again, but it isn't necessarily faster overall.

FILE_FLAG_NO_BUFFERING通常会为您提供更一致的速度和更好的流媒体行为,并避免您可能不再需要的数据污染您的磁盘缓存,但总体上不一定更快。

You should also test to see what the best size is for each block of IO. In my experience There is a huge performance difference between copying a file 4k at a time and copying it 1Mb at a time.

您还应该测试看看每个 IO 块的最佳大小是多少。根据我的经验,一次复制 4k 的文件和一次复制 1Mb 的文件之间存在巨大的性能差异。

In my past testing of this (a few years ago) I found that block sizes below about 64kB were dominated by overhead, and total throughput continued to improve with larger block sizes up to about 512KB. I wouldn't be surprised if with today's drives you needed to use block sizes larger than 1MB to get maximum throughput.

在我过去的测试中(几年前),我发现低于约 64kB 的块大小由开销主导,并且总吞吐量随着更大的块大小继续提高,高达约 512KB。如果使用今天的驱动器您需要使用大于 1MB 的块大小来获得最大吞吐量,我不会感到惊讶。

The numbers you are currently using appear to be reasonable, but may not be optimal. Also I'm fairly certain that FILE_FLAG_WRITE_THROUGH prevents the use of the on-disk cache and thus will cost you a fair bit of performance.

您当前使用的数字似乎是合理的,但可能不是最佳的。此外,我相当肯定 FILE_FLAG_WRITE_THROUGH 会阻止使用磁盘缓存,因此会降低性能。

You need to also be aware that copying files using CreateFile/WriteFile will not copy metadata such as timestamps or alternate data streams on NTFS. You will have to deal with these things on your own.

您还需要注意,使用 CreateFile/WriteFile 复制文件不会复制元数据,例如 NTFS 上的时间戳或备用数据流。你将不得不自己处理这些事情。

Actually replacing CopyFilewith your own code is quite a lot of work.

实际上用CopyFile你自己的代码替换是相当多的工作。

Addendum:

附录:

I should probably mention that when I tried this with software Raid 0 on WindowsNT 3.0 (about 10 years ago). The speed was VERY sensitive to the alignment in memory of the buffers. It turned out that at the time, the SCSI drivers had to use a special algorithm for doing DMA from a scatter/gather list, when the DMA was more than 16 physical regions of memory (64Kb). To get guranteed optimal performance required physically contiguous allocations - which is something that only drivers can request. This was basically a workaround for a bug in the DMA controller of a popular chipset back then, and is unlikely to still be an issue.

我可能应该提到,当我在 WindowsNT 3.0 上使用软件 Raid 0 尝试这个时(大约 10 年前)。速度对缓冲区内存中的对齐非常敏感。事实证明,当 DMA 超过 16 个物理内存区域 (64Kb) 时,SCSI 驱动程序必须使用特殊算法从分散/收集列表中执行 DMA。为了获得保证的最佳性能,需要物理上连续的分配——这是只有司机才能要求的东西。这基本上是当时流行芯片组的 DMA 控制器中的错误的解决方法,并且不太可能仍然是一个问题。

BUT - I would still strongly suggest that you test ALL power of 2 block sizes from 32kb to 32Mb to see which is faster. And you might consider testing to see if some buffers are consistently faster than others - it's not unheard of.

但是 - 我仍然强烈建议您测试从 32kb 到 32Mb 的 2 个块大小的所有功能,看看哪个更快。您可能会考虑测试一些缓冲区是否始终比其他缓冲区快 - 这并非闻所未闻。

回答by Len Holgate

A while back I wrote a blog posting about async file I/O and how it often tends to actually end up being synchronous unless you do everything just right (http://www.lenholgate.com/blog/2008/02/when-are-asynchronous-file-writes-not-asynchronous.html).

不久前,我写了一篇关于异步文件 I/O 的博客文章,以及它通常如何实际上最终是同步的,除非你做的一切都恰到好处 ( http://www.lenholgate.com/blog/2008/02/when- are-asynchronous-file-writes-not-asynchronous.html)。

The key points are that even when you're using FILE_FLAG_OVERLAPPEDand FILE_FLAG_NO_BUFFERINGyou still need to pre-extend the file so that your async writes don't need to extend the file as they go; for security reasons file extension is always synchronous. To pre-extend you need to do the following:

关键点是,即使您正在使用FILE_FLAG_OVERLAPPED并且FILE_FLAG_NO_BUFFERING您仍然需要预先扩展文件,以便您的异步写入不需要在进行时扩展文件;出于安全原因,文件扩展名始终是同步的。要预扩展,您需要执行以下操作:

  • Enable the SE_MANAGE_VOLUME_NAMEprivilege.
  • Open the file.
  • Seek to the desired file length with SetFilePointerEx().
  • Set the end of file with SetEndOfFile().
  • Set the end of the valid data within the file SetFileValidData().
  • Close the file.
  • 启用SE_MANAGE_VOLUME_NAME特权。
  • 打开文件。
  • 使用 寻找所需的文件长度SetFilePointerEx()
  • 设置文件的结尾SetEndOfFile()
  • 设置文件内有效数据的结尾SetFileValidData()
  • 关闭文件。

Then...

然后...

  • Open the file to write.
  • Issue the writes
  • 打开要写入的文件。
  • 发出写入

回答by ring0

I did some tests and have some results. The tests were performed on 100Mbps & 1Gbps NIC. The source machine is Win2K3 server (SATA) and the target machine is Win2k3 server (RAID 5).

我做了一些测试并得到了一些结果。测试是在 100Mbps 和 1Gbps NIC 上进行的。源机器是Win2K3服务器(SATA),目标机器是Win2k3服务器(RAID 5)。

I ran 3 tests:

我进行了 3 个测试:

1) Network Reader-> This program just reads files across the network. The purpose of the program is to find the maximum n/w read speed. I am performing a NON BUFFERED reads using CreateFile & ReadFile.

1) Network Reader-> 该程序只是通过网络读取文件。该程序的目的是找到最大的 n/w 读取速度。我正在使用 CreateFile 和 ReadFile 执行非缓冲读取。

2) Disk Writer-> This program benchmarks the RAID 5 speed by writing data. NON BUFFERED writes are performed using CreateFile & WriteFile.

2) Disk Writer-> 该程序通过写入数据来衡量 RAID 5 速度。使用 CreateFile 和 WriteFile 执行非缓冲写入。

3) Blitz Copy-> This program is the file copy engine. It copies files across the network. The logic of this program was discussed in the initial question. I am using synchronous I/O with NO_BUFFERING Reads & Writes. The APIs used are CreateFile, ReadFile & WriteFile.

3) Blitz Copy-> 这个程序是文件复制引擎。它通过网络复制文件。这个程序的逻辑在最初的问题中讨论过。我正在使用带有 NO_BUFFERING 读取和写入的同步 I/O。使用的 API 是 CreateFile、ReadFile 和 WriteFile。



Below are the results:

以下是结果:

NETWORK READER:-

网络阅读器:-

100 Mbps NIC

100 Mbps 网卡

Took 148344 ms to read 768 MB with chunk size 8 KB.

用 148344 毫秒读取 768 MB,块大小为 8 KB。

Took 89359 ms to read 768 MB with chunk size 64 KB

花费 89359 毫秒读取 768 MB,块大小为 64 KB

Took 82625 ms to read 768 MB with chunk size 128 KB

用 82625 毫秒读取 768 MB,块大小为 128 KB

Took 79594 ms to read 768 MB with chunk size 256 KB

花费 79594 毫秒读取 768 MB,块大小为 256 KB

Took 78687 ms to read 768 MB with chunk size 512 KB

花费 78687 毫秒读取 768 MB,块大小为 512 KB

Took 79078 ms to read 768 MB with chunk size 1024 KB

用 79078 毫秒读取 768 MB,块大小为 1024 KB

Took 78594 ms to read 768 MB with chunk size 2048 KB

花费 78594 毫秒读取 768 MB,块大小为 2048 KB

Took 78406 ms to read 768 MB with chunk size 4096 KB

花费 78406 毫秒读取 768 MB,块大小为 4096 KB

Took 78281 ms to read 768 MB with chunk size 8192 KB

花费 78281 毫秒读取 768 MB,块大小为 8192 KB

1 Gbps NIC

1 Gbps 网卡

Took 206203 ms to read 5120 MB (5GB) with chunk size 8 KB

用 206203 毫秒读取 5120 MB (5GB),块大小为 8 KB

Took 77860 ms to read 5120 MB with chunk size 64 KB

花费 77860 毫秒读取 5120 MB,块大小为 64 KB

Took 74531 ms to read 5120 MB with chunk size 128 KB

花费 74531 毫秒读取 5120 MB,块大小为 128 KB

Took 68656 ms to read 5120 MB with chunk size 256 KB

用 68656 毫秒读取 5120 MB,块大小为 256 KB

Took 64922 ms to read 5120 MB with chunk size 512 KB

用 64922 毫秒读取 5120 MB,块大小为 512 KB

Took 66312 ms to read 5120 MB with chunk size 1024 KB

用 66312 毫秒读取 5120 MB,块大小为 1024 KB

Took 68688 ms to read 5120 MB with chunk size 2048 KB

用 68688 毫秒读取 5120 MB,块大小为 2048 KB

Took 64922 ms to read 5120 MB with chunk size 4096 KB

用 64922 毫秒读取 5120 MB,块大小为 4096 KB

Took 66047 ms to read 5120 MB with chunk size 8192 KB

花费 66047 毫秒读取 5120 MB,块大小为 8192 KB

DISK WRITER:-

磁盘写入器:-

Write performed on RAID 5 With NO_BUFFERING & WRITE_THROUGH

使用 NO_BUFFERING 和 WRITE_THROUGH 在 RAID 5 上执行写入

Writing 2048MB (2GB) of data with chunk size 4MB took 68328ms.

以 4MB 的块大小写入 2048MB (2GB) 的数据需要 68328 毫秒。

Writing 2048MB of data with chunk size 8MB took 55985ms.

以 8MB 的块大小写入 2048MB 的数据需要 55985 毫秒。

Writing 2048MB of data with chunk size 16MB took 49569ms.

以 16MB 的块大小写入 2048MB 的数据需要 49569 毫秒。

Writing 2048MB of data with chunk size 32MB took 47281ms.

以 32MB 的块大小写入 2048MB 的数据需要 47281 毫秒。

Write performed on RAID 5 With NO_BUFFERING only

仅使用 NO_BUFFERING 在 RAID 5 上执行写入

Writing 2048MB (2GB) of data with chunk size 4MB took 57484ms.

以 4MB 的块大小写入 2048MB (2GB) 的数据需要 57484 毫秒。

Writing 2048MB of data with chunk size 8MB took 52594ms.

以 8MB 的块大小写入 2048MB 的数据需要 52594 毫秒。

Writing 2048MB of data with chunk size 16MB took 49125ms.

以 16MB 的块大小写入 2048MB 的数据需要 49125 毫秒。

Writing 2048MB of data with chunk size 32MB took 46360ms.

以 32MB 的块大小写入 2048MB 的数据需要 46360 毫秒。

Write performance degrades linearly as the chunk size reduces. And WRITE_THROUGH flag introduces some performance hit

随着块大小的减小,写入性能线性下降。并且 WRITE_THROUGH 标志引入了一些性能损失

BLITZ COPY:-

闪电战副本:-

1 Gbps NIC, Copying 60 GB of files with NO_BUFFERING

1 Gbps NIC,使用 NO_BUFFERING 复制 60 GB 的文件

Time Taken to complete copy : 2236735 ms. Ie, 37.2 mins. The speed is ~ 97 GB / per.

完成复制所需的时间:2236735 毫秒。即,37.2 分钟。速度为 ~ 97 GB / per。

100 Mbps NIC, Copying 60 GB of files with NO_BUFFERING

100 Mbps NIC,使用 NO_BUFFERING 复制 60 GB 的文件

Time Taken to complete copy : 7337219 ms. Ie, 122 mins. The speed is ~ 30 GB / per.

完成复制所需的时间:7337219 毫秒。即,122 分钟。速度为 ~ 30 GB / per。

I did try using 10-FileCopy program by Jeffrey Ritcher that uses Async-IO with NO_BUFFERING. But, the results were poor. I guess the reason could be the chunk size is 256 KB... 256 KB write on RAID 5 is terribly slow.

我确实尝试使用 Jeffrey Ritcher 的 10-FileCopy 程序,该程序使用带有 NO_BUFFERING 的 Async-IO。但是,结果很差。我想原因可能是块大小是 256 KB ... RAID 5 上的 256 KB 写入非常慢。

Comparing with robocopy:

与 robocopy 的比较:

100 Mbps NIC : Blitz Copy and robocopy perform @ ~30 GB per hour.

100 Mbps 网卡:Blitz Copy 和 robocopy 执行 @ ~30 GB 每小时。

1 GBps NIC : Blitz Copy goes @ ~97 GB per hour while robocopy @ ~50 GB per hour.

1 GBps 网卡:闪电复制每小时 @ ~97 GB,而 robocopy @ 每小时 ~50 GB。

回答by RickNZ

How fast can you read the source file if you don't write the destination?

如果不写入目标文件,读取源文件的速度有多快?

Is the source file fragmented? Fragmented reads can be an order of magnitude slower than contiguous reads. You can use the "contig" utility to make it contiguous:

源文件是否碎片化?分片读取可能比连续读取慢一个数量级。您可以使用“contig”实用程序使其连续:

http://technet.microsoft.com/en-us/sysinternals/bb897428.aspx

http://technet.microsoft.com/en-us/sysinternals/bb897428.aspx

How fast is the network connecting the two machines?

连接两台机器的网络速度有多快?

Have you tried just writing dummy data, without reading it first, like ATTO does?

您是否尝试过只写入虚拟数据,而不像 ATTO 那样先读取它?

Do you have more than one read or write request in flight at a time?

您是否一次有多个读取或写入请求?

What's the stripe size of your RAID-5 array? Writing a full stripe at a time is the fastest way to write to RAID-5.

您的 RAID-5 阵列的条带大小是多少?一次写入一个完整的条带是写入 RAID-5 的最快方法。

回答by Thomas Matthews

Just remember that a hard disk buffers data coming from the platters and going to the platters. Most disk drives will try to optimize the read requests to keep the platters rotating and minimize head movement. The drives try to absorb as much data from the Host before writing to the platters so that the Host can be disconnected as soon as possible.

请记住,硬盘会缓冲来自盘片和进入盘片的数据。大多数磁盘驱动器会尝试优化读取请求以保持盘片旋转并最大程度地减少磁头移动。在写入盘片之前,驱动器尝试从主机吸收尽可能多的数据,以便主机可以尽快断开连接。

Your performance also depends on the I/O bus traffic on the PC as well as the traffic between the disk and the host. There are other alternative factors to consider such as system tasks and programs running "at the same time". You may not be able to achieve the exact performance as your measuring tool. And remember that these timings have a error factor due to the above mentioned overheads.

您的性能还取决于 PC 上的 I/O 总线流量以及磁盘和主机之间的流量。还有其他替代因素需要考虑,例如系统任务和“同时”运行的程序。您可能无法实现作为测量工具的准确性能。请记住,由于上述开销,这些时间具有错误因素。

If your platform has DMA controllers, try using these.

如果您的平台有 DMA 控制器,请尝试使用这些控制器。

回答by ring0

If write speed is that important, why not consider RAID 0 for your hardware configuration?

如果写入速度那么重要,为什么不考虑将 RAID 0 用于您的硬件配置?

  • The customer wants RAID 5.
  • Preferred over RAID 0 because of better fault tolerance.
  • The customer is satisfied with what RAID 5 can offer. The question here is benchmarking the hardware using ATTO shows a write speed of 2.57 GB per minute (8MB chunk write), why cant a copy tool achieve close to it ? Something like 2 GB per min is what we are looking at. We've been able to achieve only ~1.5 GB per min so far.
  • 客户想要 RAID 5。
  • 由于具有更好的容错性,因此优于 RAID 0。
  • 客户对 RAID 5 所能提供的功能感到满意。这里的问题是使用 ATTO 对硬件进行基准测试显示每分钟 2.57 GB 的写入速度(8MB 块写入),为什么复制工具不能接近它?我们正在考虑每分钟 2 GB 之类的东西。到目前为止,我们每分钟只能达到约 1.5 GB。

回答by Foredecker

The right way to do this is with un-buffered fully asynchronous I/O. You will want to issue multiple I/Os to keep a queue going. This lets the file system, driver, and Raid-5 sub-system more optimally mange the I/Os.

正确的方法是使用无缓冲的完全异步 I/O。您将需要发出多个 I/O 以保持队列运行。这让文件系统、驱动程序和 Raid-5 子系统更优化地管理 I/O。

You can also open multiple files and issue read and wites to multiple files.

您还可以打开多个文件并向多个文件发出 read 和 wites。

NOTE! The optimal number of outstanding I/Os and how you interleave reads and writes will depend greatly on the storage sub-system itself. Your program will need to be highly paramterized so you can tune it.

笔记!未完成 I/O 的最佳数量以及您如何交错读取和写入将在很大程度上取决于存储子系统本身。您的程序需要高度参数化,以便您可以对其进行调整。

Note - I belive that Robocopy has been improved - have you tried it? I

注意 - 我相信 Robocopy 已得到改进 - 您尝试过吗?一世