C++ ofstream 文件写入是否使用缓冲区?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10449772/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 14:04:13  来源:igfitidea点击:

Does C++ ofstream file writing use a buffer?

c++filefile-io

提问by

Below are two programs that write 50,000,000 bytes to a file.

下面是两个将 50,000,000 字节写入文件的程序。

The first program, written in C, utilizes a buffer, that once filled to a arbitrary value, writes to disk, and then repeats that process until all 50,000,000 bytes are written. I noticed that as I increased the size of the buffer, the program took less time to run. For instance, at BUFFER_SIZE = 1, the program took ~88.0463 seconds, whereas at BUFFER_SIZE = 1024, the program only took ~1.7773 seconds. The best time I recorded was when BUFFER_SIZE = 131072. As the BUFFER_SIZE increased higher than that, I noticed that it began to actually take a little longer.

第一个程序是用 C 编写的,它使用一个缓冲区,该缓冲区一旦填充为任意值,就会写入磁盘,然后重复该过程,直到写入所有 50,000,000 个字节。我注意到当我增加缓冲区的大小时,程序运行的时间就会减少。例如,在 BUFFER_SIZE = 1 时,程序花费了 ~88.0463 秒,而在 BUFFER_SIZE = 1024 时,程序只花费了 ~1.7773 秒。我记录的最佳时间是 BUFFER_SIZE = 131072 时。随着 BUFFER_SIZE 增加到更高,我注意到它实际上开始需要更长的时间。

The second program, written in C++, utilizes ofstream to write one byte at a time. To my surprise, the program only took ~1.87 seconds to run. I expected it to take a minute or so, like the C program with BUFFER_SIZE = 1. Obviously, the C++ ofstream handles file writing differently than I thought. According to my data, it is performing pretty similarly to the C file with BUFFER_SIZE = 512. Does it use some sort of behind-the-scenes buffer?

第二个程序是用 C++ 编写的,它利用 ofstream 一次写入一个字节。令我惊讶的是,该程序只用了大约 1.87 秒即可运行。我预计它需要一分钟左右的时间,就像 BUFFER_SIZE = 1 的 C 程序一样。显然,C++ ofstream 处理文件写入的方式与我想象的不同。根据我的数据,它的性能与 BUFFER_SIZE = 512 的 C 文件非常相似。它是否使用某种幕后缓冲区?

Here is the C program:

这是C程序:

const int NVALUES = 50000000; //#values written to the file
const char FILENAME[] = "/tmp/myfile";
const int BUFFER_SIZE = 8192; //# bytes to fill in buffer before writing

main()
{
    int fd;  //File descriptor associated with output file
    int i;
    char writeval = '
int main()
{
    ofstream ofs("/tmp/iofile2");
    int i;

    for(i=0; i<50000000; i++)
        ofs << '
#include <fstream>
#include <vector>

int main () {
  std::vector<char> vec(512);

  std::fstream fs;
  fs.rdbuf()->pubsetbuf(&vec.front(), vec.size());

  // operations with file stream here.
  fs << "Hello, World!\n";

  // the stream is automatically closed when the scope ends, so fs.close() is optional
  // the stream is automatically flushed when it is closed, so fs.flush() is optional

  return 0;
}
'; ofs.flush(); ofs.close(); return 0; }
'; char buffer[BUFFER_SIZE]; //Open file for writing and associate it with the file descriptor //Create file if it does not exist; if it does exist truncate its size to 0 fd = open(FILENAME, O_WRONLY|O_CREAT|O_TRUNC, S_IRUSR|S_IWUSR); for(i=0;i<NVALUES;i++) { //Package bytes into BUFFER_SIZE chunks //and write a chunk once it is filled buffer[i%BUFFER_SIZE] = writeval; if((i%BUFFER_SIZE == BUFFER_SIZE-1 || i == NVALUES-1)) write(fd, buffer, i%BUFFER_SIZE+1); } fsync(fd); close(fd); }

Here is the C++ program:

这是 C++ 程序:

##代码##

Thank you for your time.

感谢您的时间。

采纳答案by bames53

Yes, ostreams use a stream buffer, some subclass of an instantiation of the template basic_streambuf. The interface of basic_streambuf is designed so that an implementation can do buffering if there's an advantage in that.

是的,ostreams 使用流缓冲区,它是模板 basic_streambuf 实例化的一些子类。basic_streambuf 的接口被设计成这样一个实现可以在有优势的情况下进行缓冲。

However this is a quality of implementation issue. Implementations are not required to do this but any competent implementation will.

然而,这是一个实施质量问题。实现不需要这样做,但任何有能力的实现都会这样做。

You can read all about it in chapter 27 of the ISO standard, though maybe a more readable source is The C++ Standard Library: A Tutorial and Reference(google search).

您可以在 ISO 标准的第 27 章中阅读所有相关内容,但可能更易读的来源是The C++ Standard Library: A Tutorial and Reference谷歌搜索)。

回答by Matthieu M.

Yes, all stream operations are buffered, though by default the standard input, output and error output are not so that interactions with the C IO is less surprising.

是的,所有的流操作都被缓冲了,尽管默认情况下标准输入、输出和错误输出不是这样,所以与 C IO 的交互就不那么令人惊讶了。

As already alluded, there is a base class streambufthat is used behind the scenes. It is provided with its own buffer, which size is an implementation detail.

正如已经提到的,有一个streambuf在幕后使用的基类。它提供了自己的缓冲区,其大小是一个实现细节。

You can check (experimentally) how much this buffer is by using streambuf::in_avail, assuming that input filestream and output filestream are setup with the same buffer size...

streambuf::in_avail假设输入文件流和输出文件流设置为相同的缓冲区大小,您可以(实验性地)使用 来检查此缓冲区的大小...

There are two other operations that you can do here that might be of interest:

您可以在此处执行另外两个可能感兴趣的操作:

  • you can change the streambufobject used by a stream, to switch to a custom version
  • you can change the buffer used by the streambufobject
  • 您可以更改streambuf流使用的对象,以切换到自定义版本
  • 您可以更改streambuf对象使用的缓冲区

both should be done either right after creating the stream or after a flush, lest some data is lost...

两者都应该在创建流之后或在 a 之后完成flush,以免丢失一些数据......

To illustrate the buffer change, check out streambuf::putsetbuf:

为了说明缓冲区的变化,请查看streambuf::putsetbuf

##代码##

Now you can repeat the experiments you did in C to find the sweet spot :)

现在您可以重复您在 C 中所做的实验以找到最佳点:)

回答by Tony The Lion

Per this, ofstreamhas an internal filebufpointer, can be read through the rdbuffunction, which points to a streambufobject, which is this:

Per thisofstream有一个内部filebuf指针,可以通过rdbuf函数读取,它指向一个streambuf对象,它是这样的:

streambufobjects are usually associated with one specific character sequence, from which they read and write data through an internal memory buffer.The buffer is an array in memory which is expected to be synchronized when needed with the physical content of the associated character sequence.

streambuf对象通常与一个特定的字符序列相关联,它们通过内部内存缓冲区读取和写入数据。缓冲区是内存中的一个数组,预计在需要时与相关字符序列的物理内容同步。

I bolded the important bits, it seems that it does make use of a buffer, but I don't know or haven't found out what kind of buffer it is.

我把重要的部分加粗,看起来它确实使用了缓冲区,但我不知道或没有发现它是什么类型的缓冲区。