Linux:何时使用分散/聚集 IO(readv、writev)与带有 fread 的大缓冲区

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10520182/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 06:15:21  来源:igfitidea点击:

Linux: When to use scatter/gather IO (readv, writev) vs a large buffer with fread

linuxio

提问by Jimm

In scatterand gather(i.e. readvand writev), Linux reads into multiple buffers and writes from multiple buffers.

分散收集(即readvwritev)中,Linux 读入多个缓冲区并从多个缓冲区写入。

If say, I have a vector of 3 buffers, I can use readv, OR I can use a single buffer, which is of combined size of 3 buffers and do fread.

如果说,我有一个包含 3 个缓冲区的向量,我可以使用readv,或者我可以使用单个缓冲区,它是 3 个缓冲区的组合大小,并且 do fread

Hence, I am confused: For which cases should scatter/gather be used and when should a single large buffer be used?

因此,我很困惑:在哪些情况下应该使用分散/收集,何时应该使用单个大缓冲区?

采纳答案by ArjunShankar

The main convenience offered by readv, writevis:

通过提供的主要便利readvwritev是:

  1. It allows working with non contiguous blocks of data. i.e. buffers need notbe part of an array, but separately allocated.
  2. The I/O is 'atomic'. i.e. If you do a writev, all the elements in the vector will be written in one contiguous operation, and writes done by other processes will not occur in between them.
  1. 它允许使用不连续的数据块。即缓冲器需要是一个阵列的一部分,但单独分配。
  2. I/O 是“原子的”。即如果您执行 a writev,向量中的所有元素将在一个连续操作中写入,并且其他进程完成的写入将不会发生在它们之间。

e.g. say, your data is naturally segmented, and comes from different sources:

例如,您的数据是自然分段的,并且来自不同的来源:

struct foo *my_foo;
struct bar *my_bar;
struct baz *my_baz;

my_foo = get_my_foo();
my_bar = get_my_bar();
my_baz = get_my_baz();

Now, all three 'buffers' are notone big contiguous block. But you want to write them contiguously into a file, for whatever reason (say for example, they are fields in a file header for a file format).

现在,所有三个“缓冲区”是不是一个大的连续的块。但是您想将它们连续写入文件,无论出于何种原因(例如,它们是文件格式的文件头中的字段)。

If you use writeyou have to choose between:

如果您使用,write您必须在以下选项之间进行选择:

  1. Copying them over into one block of memory using, say, memcpy(overhead), followed by a single writecall. Then the write will be atomic.
  2. Making three separate calls to write(overhead). Also, writecalls from other processes can intersperse between these writes (not atomic).
  1. 使用(例如)memcpy(开销)将它们复制到一个内存块中,然后进行一次write调用。然后写入将是原子的。
  2. write(开销)进行三个单独的调用。此外,write来自其他进程的调用可以穿插在这些写入(非原子)之间。

If you use writevinstead, its all good:

如果您writev改为使用,则一切正常:

  1. You make exactly one system call, and no memcpyto make a single buffer from the three.
  2. Also, the three buffers are written atomically, as one block write. i.e. if other processes also write, then these writes will not come in between the writes of the three vectors.
  1. 您只进行一个系统调用,而不是memcpy从三个系统调用中创建一个缓冲区。
  2. 此外,三个缓冲区以原子方式写入,作为一个块写入。即如果其他进程也写入,那么这些写入将不会出现在三个向量的写入之间。

So you would do something like:

所以你会做这样的事情:

struct iovec iov[3];

iov[0].iov_base = my_foo;
iov[0].iov_len = sizeof (struct foo);
iov[1].iov_base = my_bar;
iov[1].iov_len = sizeof (struct bar);
iov[2].iov_base = my_baz;
iov[2].iov_len = sizeof (struct baz);

bytes_written = writev (fd, iov, 3);

Sources:

资料来源:

  1. http://pubs.opengroup.org/onlinepubs/009604499/functions/writev.html
  2. http://linux.die.net/man/2/readv
  1. http://pubs.opengroup.org/onlinepubs/009604499/functions/writev.html
  2. http://linux.die.net/man/2/readv