C++ std::fstream 缓冲 vs 手动缓冲(为什么手动缓冲增益 10 倍)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12997131/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
std::fstream buffering vs manual buffering (why 10x gain with manual buffering)?
提问by Vincent
I have tested two writing configurations:
我测试了两种写入配置:
Fstream buffering:
// Initialization const unsigned int length = 8192; char buffer[length]; std::ofstream stream; stream.rdbuf()->pubsetbuf(buffer, length); stream.open("test.dat", std::ios::binary | std::ios::trunc) // To write I use : stream.write(reinterpret_cast<char*>(&x), sizeof(x));
Manual buffering:
// Initialization const unsigned int length = 8192; char buffer[length]; std::ofstream stream("test.dat", std::ios::binary | std::ios::trunc); // Then I put manually the data in the buffer // To write I use : stream.write(buffer, length);
Fstream 缓冲:
// Initialization const unsigned int length = 8192; char buffer[length]; std::ofstream stream; stream.rdbuf()->pubsetbuf(buffer, length); stream.open("test.dat", std::ios::binary | std::ios::trunc) // To write I use : stream.write(reinterpret_cast<char*>(&x), sizeof(x));
手动缓冲:
// Initialization const unsigned int length = 8192; char buffer[length]; std::ofstream stream("test.dat", std::ios::binary | std::ios::trunc); // Then I put manually the data in the buffer // To write I use : stream.write(buffer, length);
I expected the same result...
我期待同样的结果......
But my manual buffering improve performance by a factor of 10 to write a file of 100MB, and the fstream buffering does not change anything compared to the normal situation (without redefining a buffer).
但是我的手动缓冲将性能提高了 10 倍以写入 100MB 的文件,并且 fstream 缓冲与正常情况相比没有任何改变(没有重新定义缓冲区)。
Does someone has an explanation of this situation ?
有人对这种情况有解释吗?
EDIT :
Here are the news : a benchmark just done on a supercomputer (linux 64-bit architecture, lasts intel Xeon 8-core, Lustre filesystem and ... hopefully well configured compilers)
(and I don't explain the reason of the "resonance" for a 1kB manual buffer...)
编辑:这是新闻:刚刚在超级计算机上完成的基准测试(Linux 64 位架构,持续英特尔至强 8 核,Lustre 文件系统和...希望配置良好的编译器)
(我不解释原因1kB 手动缓冲区的“共振”...)
EDIT 2 :
And the resonance at 1024 B (if someone has an idea about that, I'm interested) :
编辑 2:1024 B 处的共振(如果有人对此有想法,我很感兴趣):
采纳答案by Vaughn Cato
This is basically due to function call overhead and indirection. The ofstream::write() method is inherited from ostream. That function is not inlined in libstdc++, which is the first source of overhead. Then ostream::write() has to call rdbuf()->sputn() to do the actual writing, which is a virtual function call.
这基本上是由于函数调用开销和间接性。ofstream::write() 方法继承自 ostream。该函数未在 libstdc++ 中内联,这是第一个开销来源。然后 ostream::write() 必须调用 rdbuf()->sputn() 来做实际的写入,这是一个虚函数调用。
On top of that, libstdc++ redirects sputn() to another virtual function xsputn() which adds another virtual function call.
最重要的是,libstdc++ 将 sputn() 重定向到另一个虚函数 xsputn(),它添加了另一个虚函数调用。
If you put the characters into the buffer yourself, you can avoid that overhead.
如果您自己将字符放入缓冲区,则可以避免这种开销。
回答by nomad85
I would like to explain what is the cause of the peak in the second chart.
我想解释一下第二张图表中出现峰值的原因是什么。
In fact, virtual functions used by std::ofstream
lead to the performance decreasing as we see on the first picture, but it does not gives an answer why the highest performance was when manual buffer size was less than 1024 bytes.
事实上,std::ofstream
正如我们在第一张图片中看到的那样,使用的虚拟函数导致性能下降,但它没有给出为什么手动缓冲区大小小于 1024 字节时性能最高的答案。
The problem relates to the high cost of writev()
and write()
system call and internal implementation of std::filebuf
internal class of std::ofstream
.
这个问题涉及的高成本writev()
和write()
系统调用和内部实现的std::filebuf
内部类的std::ofstream
。
To show the how write()
influences on the performance I did a simple test using dd
tool on my Linux machine to copy 10MB file with different buffer sizes (bs option):
为了展示write()
对性能的影响,我dd
在我的 Linux 机器上使用工具进行了一个简单的测试,以复制具有不同缓冲区大小的 10MB 文件(bs 选项):
test@test$ time dd if=/dev/zero of=zero bs=256 count=40000
40000+0 records in
40000+0 records out
10240000 bytes (10 MB) copied, 2.36589 s, 4.3 MB/s
real 0m2.370s
user 0m0.000s
sys 0m0.952s
test$test: time dd if=/dev/zero of=zero bs=512 count=20000
20000+0 records in
20000+0 records out
10240000 bytes (10 MB) copied, 1.31708 s, 7.8 MB/s
real 0m1.324s
user 0m0.000s
sys 0m0.476s
test@test: time dd if=/dev/zero of=zero bs=1024 count=10000
10000+0 records in
10000+0 records out
10240000 bytes (10 MB) copied, 0.792634 s, 12.9 MB/s
real 0m0.798s
user 0m0.008s
sys 0m0.236s
test@test: time dd if=/dev/zero of=zero bs=4096 count=2500
2500+0 records in
2500+0 records out
10240000 bytes (10 MB) copied, 0.274074 s, 37.4 MB/s
real 0m0.293s
user 0m0.000s
sys 0m0.064s
As you can see that the less buffer is, the less write speed is and the much time dd
spends in the system space. So, read/write speed decreases when buffer size decreases.
可以看到缓冲区越少,写入速度越慢,dd
占用系统空间的时间越多。因此,当缓冲区大小减小时,读/写速度会降低。
But why the highest speed was when manual buffer size was less than 1024 bytes in the topic creator manual buffer tests? Why it was almost constant?
但是为什么在主题创建者手动缓冲区测试中手动缓冲区大小小于 1024 字节时速度最高?为什么它几乎是恒定的?
The explanation relates to the std::ofstream
implementation, especially to the std::basic_filebuf
.
解释与std::ofstream
实现有关,尤其是std::basic_filebuf
.
By default it uses 1024 bytes buffer (BUFSIZ variable). So, when you write your data using pieces less than 1024, writev()
(not write()
) system call is called at least once for two ofstream::write()
operations (pieces have size of 1023 < 1024 - first is written to the buffer, and second forces writing of first and second). Based on it, we can conclude that ofstream::write()
speed does not depend on the manual buffer size before the peak (write()
is called at least twice rarely).
默认情况下,它使用 1024 字节缓冲区(BUFSIZ 变量)。因此,当您使用小于 1024 的片段写入数据时,writev()
(不write()
)系统调用至少为两个ofstream::write()
操作调用一次(片段的大小为 1023 < 1024 - 第一个写入缓冲区,第二个强制写入第一个和第二个)。基于它,我们可以得出结论,ofstream::write()
速度不依赖于峰值前的手动缓冲区大小(write()
很少被调用至少两次)。
When you try writing greater or equal to 1024 bytes buffer at once using ofstream::write()
call, writev()
system call is called for each ofstream::write
. So, you see that speed increases when manual buffer is greater than 1024 (after the peak).
当您尝试使用ofstream::write()
call一次写入大于或等于 1024 字节的缓冲区时,writev()
系统会为每个ofstream::write
. 因此,您会看到当手动缓冲区大于 1024(峰值之后)时速度会增加。
Moreover, if you would like to set std::ofstream
buffer greater than 1024 buffer (for example, 8192 bytes buffer) using streambuf::pubsetbuf()
and call ostream::write()
to write data using pieces of 1024 size, you would be suprised that write speed will be the same as you will use 1024 buffer. It is because implementation of std::basic_filebuf
- the internal class of std::ofstream
- is hard coded to forcecalling system writev()
call for each ofstream::write()
call when passed buffer is greater or equal to 1024 bytes(see basic_filebuf::xsputn()source code). There is also an open issue in the GCC bugzilla which was reported at 2014-11-05.
此外,如果您想设置std::ofstream
缓冲区大于 1024 缓冲区(例如,8192 字节缓冲区)使用streambuf::pubsetbuf()
并调用ostream::write()
使用 1024 大小的块写入数据,您会惊讶于写入速度将与您将使用 1024 缓冲区相同. 这是因为实施std::basic_filebuf
-内部类的std::ofstream
-被硬编码强制呼叫系统writev()
调用每个ofstream::write()
时传递的缓冲区是调用大于或等于1024个字节(见basic_filebuf :: xsputn()的源代码)。在2014-11-05报告的 GCC bugzilla 中还有一个未解决的问题。
So, the solution of this problem can be done using two possible cases:
因此,可以使用两种可能的情况来解决此问题:
- replace
std::filebuf
by your own class and redefinestd::ofstream
- devide a buffer, which has to be passed to the
ofstream::write()
, to the pieces less than 1024 and pass them to theofstream::write()
one by one - don't pass small pieces of data to the
ofstream::write()
to avoid decreasing performance on the virtual functions ofstd::ofstream
- 替换
std::filebuf
为您自己的类并重新定义std::ofstream
- 划分一个缓冲区,必须传递给
ofstream::write()
, 小于 1024 的块并ofstream::write()
一一传递给 - 不要将小块数据传递给
ofstream::write()
以避免降低虚函数的性能std::ofstream
回答by wolf1oo
I'd like to add to the existing responses that this performance behavior (all the overhead from the virtual method calls/indirection) is typically not an issue if writing large blocks of data. What seems to have been omitted from the question and these prior answers (although probably implicitly understood) is that the original code was writing a small number of bytes each time. Just to clarify for others: if you are writing large blocks of data (~kB+), there is no reason to expect manually buffering will have a significant performance difference to using std::fstream
's buffering.
我想添加到现有的响应中,如果写入大数据块,这种性能行为(来自虚拟方法调用/间接的所有开销)通常不是问题。问题和这些先前的答案(尽管可能隐含地理解)似乎被忽略的是原始代码每次都写入少量字节。只是为了向其他人澄清:如果您正在写入大块数据(~kB+),则没有理由期望手动缓冲与 usingstd::fstream
的缓冲会有显着的性能差异。