在 C++ 中实现高性能顺序文件 I/O 的最快方法是什么？

Question

提问by Adam Holmberg

Assuming the following for...
Output:
The file is opened...
Data is 'streamed' to disk. The data in memory is in a large contiguous buffer. It is written to disk in its raw form directly from that buffer. The size of the buffer is configurable, but fixed for the duration of the stream. Buffers are written to the file, one after another. No seek operations are conducted.
...the file is closed.

假设以下内容...
输出：
文件已打开...
数据“流式传输”到磁盘。内存中的数据位于一个大的连续缓冲区中。它直接从该缓冲区以原始形式写入磁盘。缓冲区的大小是可配置的，但在流的持续时间内是固定的。缓冲区被一个接一个地写入文件。不进行寻道操作。
...文件已关闭。

Input:
A large file (sequentially written as above) is read from disk from beginning to end.

输入：
从磁盘从头到尾读取一个大文件（按上述顺序写入）。

Are there generally accepted guidelines for achieving the fastest possible sequential file I/O in C++?

在 C++ 中实现最快的顺序文件 I/O 是否有普遍接受的准则？

Some possible considerations:

一些可能的考虑：

Guidelines for choosing the optimal buffer size
Will a portable library like boost::asio be too abstracted to expose the intricacies of a specific platform, or can they be assumed to be optimal?
Is asynchronous I/O always preferable to synchronous? What if the application is not otherwise CPU-bound?

选择最佳缓冲区大小的指南
像 boost::asio 这样的便携式库是否会过于抽象而无法暴露特定平台的复杂性，还是可以假设它们是最佳的？
异步 I/O 总是比同步更可取吗？如果应用程序不受 CPU 限制怎么办？

I realize that this will have platform-specific considerations. I welcome general guidelines as well as those for particular platforms.
(my most immediate interest in Win x64, but I am interested in comments on Solaris and Linux as well)

我意识到这将有特定于平台的考虑。我欢迎通用指南以及针对特定平台的指南。
（我对 Win x64 最感兴趣，但我对 Solaris 和 Linux 的评论也很感兴趣）

Answer 1

采纳答案by quark

Are there generally accepted guidelines for achieving the fastest possible sequential file I/O in C++?

在 C++ 中实现最快的顺序文件 I/O 是否有普遍接受的准则？

Rule 0: Measure. Use all available profiling tools and get to know them. It's almost a commandment in programming that if you didn't measure it you don't know how fast it is, and for I/O this is even more true. Make sure to test under actual work conditionsif you possibly can. A process that has no competition for the I/O system can be over-optimized, fine-tuned for conditions that don't exist under real loads.

规则 0：测量。使用所有可用的分析工具并了解它们。这几乎是编程中的一条戒律，如果你不测量它，你不知道它有多快，对于 I/O 来说更是如此。如果可能，请确保在实际工作条件下进行测试。对 I/O 系统没有竞争的进程可以过度优化，针对实际负载下不存在的条件进行微调。

Use mapped memory instead of writing to files. This isn't always faster but it allows the opportunity to optimize the I/O in an operating system-specific but relatively portable way, by avoiding unnecessary copying, and taking advantage of the OS's knowledge of how the disk actually being used. ("Portable" if you use a wrapper, not an OS-specific API call).
Try and linearize your output as much as possible. Having to jump around memory to find the buffers to write can have noticeable effects under optimized conditions, because cache lines, paging and other memory subsystem issues will start to matter. If you have lots of buffers look into support for scatter-gather I/Owhich tries to do that linearizing for you.

使用映射内存而不是写入文件。这并不总是更快，但它允许有机会以特定于操作系统但相对可移植的方式优化 I/O，避免不必要的复制，并利用操作系统对磁盘实际使用方式的了解。（“便携式”，如果您使用包装器，而不是特定于操作系统的 API 调用）。
尝试尽可能线性化您的输出。在优化条件下，必须绕过内存来查找要写入的缓冲区可能会产生明显的影响，因为缓存行、分页和其他内存子系统问题将开始变得重要。如果您有很多缓冲区，请查看对scatter-gather I/O 的支持，它会尝试为您进行线性化。

Some possible considerations:

一些可能的考虑：

Guidelines for choosing the optimal buffer size

选择最佳缓冲区大小的指南

Page size for starters, but be ready to tune from there.

初学者的页面大小，但准备好从那里进行调整。

Will a portable library like boost::asio be too abstracted to expose the intricacies of a specific platform, or can they be assumed to be optimal?

像 boost::asio 这样的便携式库是否会过于抽象而无法暴露特定平台的复杂性，还是可以假设它们是最佳的？

Don't assume it's optimal. It depends on how thoroughly the library gets exercised on your platform, and how much effort the developers put into making it fast. Having said that a portable I/O library canbe very fast, because fast abstractions exist on most systems, and it's usually possible to come up with a general API that covers a lot of the bases. Boost.Asio is, to the best of my limited knowledge, fairly fine tuned for the particular platform it is on: there's a whole family of OS and OS-variant specific APIs for fast async I/O (e.g. epoll, /dev/epoll, kqueue, Windows overlapped I/O), and Asio wraps them all.

不要假设它是最佳的。这取决于库在您的平台上的使用程度，以及开发人员为使其快速运行付出了多少努力。话虽如此，可移植的 I/O 库可以非常快，因为大多数系统上都存在快速抽象，并且通常可以提出涵盖许多基础的通用 API。Boost.Asio 就我有限的知识而言，已经针对它所在的特定平台进行了相当精细的调整：有一整套操作系统和操作系统变体特定 API 用于快速异步 I/O（例如 epoll、/dev/epoll、kqueue、Windows 重叠 I/O），而 Asio 将它们全部包装起来。

Is asynchronous I/O always preferable to synchronous? What if the application is not otherwise CPU-bound?

异步 I/O 总是比同步更可取吗？如果应用程序不受 CPU 限制怎么办？

Asynchronous I/O isn't faster in a raw sense than synchronous I/O. What asynchronous I/O does is ensure that yourcode is not wasting time waiting for the I/O to complete. It is faster in a general way than the other method of not wasting that time, namely using threads, because it will call back into your code when I/O is ready and not before. There are no false starts or concerns with idle threads needing to be terminated.

在原始意义上，异步 I/O 并不比同步 I/O 快。异步 I/O 的作用是确保您的代码不会浪费时间等待 I/O 完成。一般而言，它比不浪费时间的其他方法（即使用线程）更快，因为它会在 I/O 准备就绪时而不是之前回调到您的代码中。没有错误启动或需要终止空闲线程的问题。

Answer 2

回答by Marc Mutz - mmutz

A general advice is to turn off buffering and read/write in large chunks (but not too large, then you will waste too much time waiting for the whole I/O to complete where otherwise you could start munching away at the first megabyte already. It's trivial to find the sweet spot with this algorithm, there's only one knob to turn: the chunk size).

一般建议是关闭缓冲和大块读/写（但不要太大，否则您将浪费太多时间等待整个 I/O 完成，否则您可能已经开始咀嚼第一个兆字节。用这个算法找到最佳点是微不足道的，只有一个旋钮可以转动：块大小）。

Beyond that, for input mmap()ing the file shared and read-only is (if not the fastest, then) the most efficient way. Call madvise()if your platform has it, to tell the kernel how you will traverse the file, so it can do readahead and throw out the pages afterwards again quickly.

除此之外，对于 inputmmap()共享和只读的文件是（如果不是最快的）最有效的方法。madvise()如果你的平台有它，就调用它，告诉内核你将如何遍历文件，这样它就可以进行预读并很快再次抛出页面。

For output, if you already have a buffer, consider underpinning it with a file (also with mmap()), so you don't have to copy the data in userspace.

对于输出，如果您已经有一个缓冲区，请考虑使用一个文件（也使用mmap()）来支持它，这样您就不必在用户空间中复制数据。

If mmap()is not to your liking, then there's fadvise(), and, for the really tough ones, async file I/O.

如果mmap()您不喜欢，那么还有fadvise()异步文件 I/O，对于真正困难的人来说。

(All of the above is POSIX, Windows names may be different).

（以上都是POSIX，Windows名称可能不同）。

Answer 3

回答by Michael A. McCloskey

For Windows, you'll want to make sure you use the FILE_FLAG_SEQUENTIAL_SCAN in your CreateFile() call, if you opt to use the platform specific Windows API call. This will optimize caching for the I/O. As far as buffer sizes go, a buffer size that is a multiple of the disk sector size is typically advised. 8K is a nice starting point with little to be gained from going larger.

对于 Windows，如果您选择使用特定于平台的 Windows API 调用，您需要确保在 CreateFile() 调用中使用 FILE_FLAG_SEQUENTIAL_SCAN。这将优化 I/O 的缓存。就缓冲区大小而言，通常建议缓冲区大小是磁盘扇区大小的倍数。8K 是一个不错的起点，从更大的角度来看几乎没有什么好处。

This article discusses the comparison between async and sync on Windows.

本文讨论了 Windows 上异步和同步之间的比较。

http://msdn.microsoft.com/en-us/library/aa365683(VS.85).aspx

Answer 4

回答by KPexEA

As you noted above it all depends on the machine / system / libraries that you are using. A fast solution on one system may be slow on another.

A general guideline though would be to write in as large of chunks as possible.
Typically writing a byte at a time is the slowest.

The best way to know for sure is to code a few different ways and profile them.

如上所述，这一切都取决于您使用的机器/系统/库。在一个系统上的快速解决方案在另一个系统上可能很慢。

不过，一般的指导原则是写入尽可能多的块。
通常一次写入一个字节是最慢的。

确定知道的最好方法是编写几种不同的方法并对其进行分析。

Answer 5

回答by Marsh Ray

You asked about C++, but it sounds like you're past that and ready to get a little platform-specific.

您询问了 C++，但听起来您已经超越了这一点，并准备好获得一些特定于平台的知识。

On Windows, FILE_FLAG_SEQUENTIAL_SCANwith a file mapping is probably the fastest way. In fact, your process can exit before the file actually makes it on to the disk. Without an explicitly-blocking flush operation, it can take up to 5 minutes for Windows to begin writing those pages.

在 Windows 上，FILE_FLAG_SEQUENTIAL_SCAN使用文件映射可能是最快的方法。事实上，您的进程可以在文件真正写入磁盘之前退出。如果没有显式阻塞刷新操作，Windows 最多可能需要 5 分钟才能开始写入这些页面。

You need to be careful if the files are not on local devices but a network drive. Network errors will show up as SEH errors, which you will need to be prepared to handle.

如果文件不在本地设备上而是在网络驱动器上，则需要小心。网络错误将显示为 SEH 错误，您需要做好处理的准备。

On *nixes, you might get a bit higher performance writing sequentially to a raw disk device. This is possible on Windows too, but not as well supported by the APIs. This will avoid a little filesystem overhead, but it may not amount to enough to be useful.

在 *nixes 上，顺序写入原始磁盘设备可能会获得更高的性能。这在 Windows 上也是可能的，但 API 不支持。这将避免一点文件系统开销，但它可能不足以有用。

Loosely speaking, RAM is 1000 or more times faster than disks, and CPU is faster still. There are probably not a lot of logical optimizations that will help, except avoiding movements of the disk heads (seek) whenever possible. A dedicated disk just for this file can help significantly here.

粗略地说，RAM 比磁盘快 1000 倍或更多，而 CPU 仍然更快。除了尽可能避免移动磁头（寻道）之外，可能没有很多逻辑优化会有所帮助。仅用于此文件的专用磁盘在这里可以提供很大帮助。

Answer 6

回答by usr

You will get the absolute fastest performance by using CreateFileand ReadFile. Open the file with FILE_FLAG_SEQUENTIAL_SCAN.

您将通过使用CreateFile和获得绝对最快的性能ReadFile。用FILE_FLAG_SEQUENTIAL_SCAN.打开文件。

Read with a buffer size that is a power of two. Only benchmarking can determine this number. I have seen it to be 8K once. Another time I found it to be 8M! This varies wildly.

读取缓冲区大小为 2 的幂。只有基准测试才能确定这个数字。我曾经看到它是8K。还有一次我发现它是8M！这变化很大。

It depends on the size of the CPU cache, on the efficiency of OS read-ahead and on the overhead associated with doing many small writes.

它取决于 CPU 缓存的大小、操作系统预读的效率以及与执行许多小写相关的开销。

Memory mapping is notthe fastest way. It has more overhead because you can't control the block size and the OS needs to fault in all pages.

内存映射不是最快的方法。它有更多的开销，因为您无法控制块大小并且操作系统需要在所有页面中出错。

Answer 7

回答by PSkocik

On Linux, buffered reads and writes speed up things a lot up, increasingly with increasing buffers sizes, but the returns are diminishing and you generally want to use BUFSIZ(defined by stdio.h) as larger buffer sizes won't help much.

在 Linux 上，缓冲读取和写入大大加快了速度，随着缓冲区大小的增加而增加，但回报正在减少，您通常希望使用BUFSIZ（由定义stdio.h），因为较大的缓冲区大小不会有太大帮助。

mmaping provides the fastest access to files, but the mmapcall itself is rather expensive. For small files (16KiB) readand writesystem calls win (see https://stackoverflow.com/a/39196499/1084774for the numbers on reading through readand mmap).

mmaping 提供对文件的最快访问，但mmap调用本身相当昂贵。对于小文件 (16KiB)read和write系统调用 win（有关阅读和的数字，请参阅https://stackoverflow.com/a/39196499/1084774）。readmmap

在 C++ 中实现高性能顺序文件 I/O 的最快方法是什么？

提问by Adam Holmberg

采纳答案by quark

回答by Marc Mutz - mmutz

回答by Michael A. McCloskey

回答by KPexEA

回答by Marsh Ray

回答by usr

回答by PSkocik

相关推荐

最近更新

标签

在 C++ 中实现高性能顺序文件 I/O 的最快方法是什么？

提问by Adam Holmberg

采纳答案by quark

回答by Marc Mutz - mmutz

回答by Michael A. McCloskey

回答by KPexEA

回答by Marsh Ray

回答by usr

回答by PSkocik

相关推荐

C++11 基于范围的循环：按值获取项目或引用 const

C++ 使用 SDL 旋转图像的最佳方法？

C++ 我怎样才能从另一个班级发出信号？

C++ 用于图像缩放的双三次插值算法

相关推荐

最近更新

标签