Linux 上的缓冲异步文件 I/O

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5664105/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-05 03:40:08  来源:igfitidea点击:

buffered asynchronous file I/O on linux

linuxasynchronousiolinux-kernelaio

提问by Marenz

I am looking for the most efficient way to do asynchronous file I/O on linux.

我正在寻找在 linux 上执行异步文件 I/O 的最有效方法。

The POSIX glibc implementation uses threads in userland.

POSIX glibc 实现使用用户空间中的线程。

The native aio kernel api only works with unbuffered operations, patches for the kernel to add support for buffered operations exist, but those are >3 years old and no one seems to care about integrating them into the mainline.

原生 aio 内核 api 仅适用于无缓冲操作,内核补丁存在以添加对缓冲操作的支持,但这些补丁已经超过 3 年了,似乎没有人关心将它们集成到主线中。

I found plenty of other ideas, concepts, patches that would allow asynchronous I/O, though most of them in articles that are also >3 years old. What of all this is really available in todays kernel? I've read about servlets, acalls, stuff with kernel threads and more things I don't even remember right now.

我发现了许多其他允许异步 I/O 的想法、概念和补丁,尽管其中大部分都出现在 3 年以上的文章中。在今天的内核中,哪些是真正可用的?我读过关于 servlets、acalls、内核线程的东西以及更多我现在不记得的东西。

What is the most efficient way to do buffered asynchronous file input/output in todays kernel?

在当今的内核中进行缓冲异步文件输入/输出的最有效方法是什么?

采纳答案by Damon

Unless you want to write your own IO thread pool, the glibc implementation is an acceptable solution. It actually works surprisingly well for something that runs entirely in userland.

除非您想编写自己的 IO 线程池,否则 glibc 实现是一个可以接受的解决方案。对于完全在用户空间中运行的东西,它实际上工作得非常好。

The kernel implementation does not work with buffered IO at all in my experience (though I've seen other people say the opposite!). Which is fine if you want to read huge amounts of data via DMA, but of course it sucks big time if you plan to take advantage of the buffer cache.
Also note that the kernel AIO calls may actually block. There is a limited size command buffer, and large reads are broken up into several smaller ones. Once the queue is full, asynchronous commands run synchronously. Surprise. I've run into this problem a year or two ago and could not find an explanation. Asking around gave me the "yeah of course, that's how it works" answer.
From what I've understood, the "official" interest in supporting buffered aio is not terribly great either, despite several working solutions seem to be available for years. Some of the arguments that I've read were on the lines of "you don't want to use the buffers anyway" and "nobody needs that" and "most people don't even use epoll yet". So, well... meh.

根据我的经验,内核实现根本不适用于缓冲 IO(尽管我看到其他人说相反的!)。如果您想通过 DMA 读取大量数据,这很好,但是如果您打算利用缓冲区缓存,那当然会浪费很多时间。
另请注意,内核 AIO 调用实际上可能会阻塞。有一个有限大小的命令缓冲区,大的读取被分解成几个较小的。一旦队列已满,异步命令就会同步运行。惊喜。一两年前我遇到过这个问题,但找不到解释。四处打听给了我“是的,当然,这就是它的工作原理”的答案。
据我所知,尽管多年来似乎有几种可行的解决方案可用,但对支持缓冲 aio 的“官方”兴趣也不是特别大。我读过的一些论点是“无论如何你都不想使用缓冲区”和“没有人需要那个”和“大多数人甚至还没有使用 epoll”。所以,嗯……嗯。

Being able to get an epollsignalled by a completed async operation was another issue until recently, but in the meantime this works really fine via eventfd.

epoll直到最近,能够通过已完成的异步操作获得信号是另一个问题,但与此同时,通过eventfd.

Note that the glibc implementation will actually spawnthreads on demand inside __aio_enqueue_request. It is probably no big deal, since spawning threads is not thatterribly expensive any more, but one should be aware of it. If your understanding of starting an asynchronous operation is "returns immediately", then that assumption may not be true, because it may be spawning some threads first.

需要注意的是glibc的实施实际上将催生需求里面的线程__aio_enqueue_request。这可能没什么大不了的,因为生成线程不再那么昂贵了,但人们应该意识到这一点。如果您对启动异步操作的理解是“立即返回”,那么该假设可能不正确,因为它可能会首先产生一些线程。

EDIT:
As a sidenote, under Windows there exists a very similar situation to the one in the glibc AIO implementation where the "returns immediately" assumption of queuing an asynchronous operation is not true.
If all data that you wanted to read is in the buffer cache, Windows will decide that it will instead run the request synchronously, because it will finish immediately anyway. This is well-documented, and admittedly sounds great, too. Except in case there are a few megabytes to copy or in case another thread has page faults or does IO concurrently (thus competing for the lock) "immediately" can be a surprisingly long time -- I've seen "immediate" times of 2-5 milliseconds. Which is no problem in most situations, but for example under the constraint of a 16.66ms frame time, you probably don't want to risk blocking for 5ms at random times. Thus, the naive assumption of "can do async IO from my render thread no problem, because async doesn't block" is flawed.

编辑
作为旁注,在 Windows 下存在与 glibc AIO 实现中的情况非常相似的情况,其中“立即返回”排队异步操作的假设不正确。
如果您要读取的所有数据都在缓冲区缓存中,Windows 将决定改为同步运行请求,因为无论如何它都会立即完成。这是有据可查的,当然听起来也很棒。除非有几兆字节要复制,或者如果另一个线程出现页面错误或同时执行 IO(因此竞争锁),“立即”可能会花费惊人的时间——我见过“立即”时间为 2 -5 毫秒。这在大多数情况下都没有问题,但例如在 16.66 毫秒帧时间的约束下,您可能不想冒险在随机时间阻塞 5 毫秒。因此,“可以从我的渲染线程执行异步 IO 没有问题,因为异步不会阻塞”的天真假设是有缺陷的。

回答by Pete Wilson

The material seems old -- well, it isold -- because it's been around for long and, while by no means trivial, is well understood. A solution you can lift is published in W. Richard Stevens's superb and unparalleled book (read "bible"). The book is the rare treasure that is clear, concise, and complete: every page gives real and immediate value:

材料看起来很旧——嗯,它旧——因为它已经存在了很长时间,虽然绝不是微不足道的,但很好理解。W. Richard Stevens 出色而无与伦比的书(阅读“圣经”)中发表了一个您可以解除的解决方案。本书清晰、简洁、完整,是难得的宝藏:每一页都具有真实而直接的价值:

   Advanced Programming in the UNIX Environment

   UNIX 环境中的高级编程

Two other such, also by Stevens, are the first two volumes of his Unix Network Programmingcollection:

另外两本也是史蒂文斯的著作,是他的Unix 网络编程合集的前两卷:

   Volume 1: The Sockets Networking API(with Fenner and Rudoff)and
   Volume 2: Interprocess Communications

   第 1 卷:套接字网络 API(与 Fenner 和 Rudoff)
   第 2 卷:进程间通信

I can't imagine being without these three fundamental books; I'm dumbstruck when I find someone who hasn't heard of them.

我无法想象没有这三本基础书籍;当我找到一个没有听说过他们的人时,我傻了。

Still more of Steven's books, just as precious:

还有更多史蒂文的书,同样珍贵:

   TCP/IP Illustrated, Vol. 1: The Protocols

   TCP/IP 图解,卷。1:协议

回答by cmccabe

I don't think the Linux kernel implementation of asynchronous file I/O is really usable unless you also use O_DIRECT, sorry.

我不认为异步文件 I/O 的 Linux 内核实现真的可用,除非您也使用 O_DIRECT,抱歉。

There's more information about the current state of the world here: https://github.com/littledan/linux-aio. It was updated in 2012 by someone who used to work at Google.

这里有更多关于世界现状的信息:https: //github.com/littledan/linux-aio。它是由曾经在 Google 工作过的人在 2012 年更新的。