用 C++ 写一个二进制文件非常快
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11563963/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Writing a binary file in C++ very fast
提问by Dominic Hofer
I'm trying to write huge amounts of data onto my SSD(solid state drive). And by huge amounts I mean 80GB.
我正在尝试将大量数据写入我的 SSD(固态驱动器)。大量我的意思是80GB。
I browsed the web for solutions, but the best I came up with was this:
我浏览了网络以寻找解决方案,但我想到的最好的方法是:
#include <fstream>
const unsigned long long size = 64ULL*1024ULL*1024ULL;
unsigned long long a[size];
int main()
{
std::fstream myfile;
myfile = std::fstream("file.binary", std::ios::out | std::ios::binary);
//Here would be some error handling
for(int i = 0; i < 32; ++i){
//Some calculations to fill a[]
myfile.write((char*)&a,size*sizeof(unsigned long long));
}
myfile.close();
}
Compiled with Visual Studio 2010 and full optimizations and run under Windows7 this program maxes out around 20MB/s. What really bothers me is that Windows can copy files from an other SSD to this SSD at somewhere between 150MB/s and 200MB/s. So at least 7 times faster. That's why I think I should be able to go faster.
使用 Visual Studio 2010 和全面优化编译并在 Windows7 下运行,该程序最大速度约为 20MB/s。真正困扰我的是 Windows 可以以 150MB/s 到 200MB/s 的速度将文件从另一个 SSD 复制到这个 SSD。所以至少快7倍。这就是为什么我认为我应该能够走得更快。
Any ideas how I can speed up my writing?
有什么想法可以加快我的写作速度吗?
采纳答案by Dominic Hofer
This did the job (in the year 2012):
这完成了工作(在 2012 年):
#include <stdio.h>
const unsigned long long size = 8ULL*1024ULL*1024ULL;
unsigned long long a[size];
int main()
{
FILE* pFile;
pFile = fopen("file.binary", "wb");
for (unsigned long long j = 0; j < 1024; ++j){
//Some calculations to fill a[]
fwrite(a, 1, size*sizeof(unsigned long long), pFile);
}
fclose(pFile);
return 0;
}
I just timed 8GB in 36sec, which is about 220MB/s and I think that maxes out my SSD. Also worth to note, the code in the question used one core 100%, whereas this code only uses 2-5%.
我只是在 36 秒内计时了 8GB,大约是 220MB/s,我认为这可以最大限度地利用我的 SSD。另外值得注意的是,问题中的代码100%使用了一个核心,而这段代码只使用了2-5%。
Thanks a lot to everyone.
非常感谢大家。
Update: 5 years have passed it's 2017 now. Compilers, hardware, libraries and my requirements have changed. That's why I made some changes to the code and did some new measurements.
更新:5 年过去了,现在是 2017 年。编译器、硬件、库和我的要求都发生了变化。这就是为什么我对代码进行了一些更改并进行了一些新的测量。
First up the code:
先上代码:
#include <fstream>
#include <chrono>
#include <vector>
#include <cstdint>
#include <numeric>
#include <random>
#include <algorithm>
#include <iostream>
#include <cassert>
std::vector<uint64_t> GenerateData(std::size_t bytes)
{
assert(bytes % sizeof(uint64_t) == 0);
std::vector<uint64_t> data(bytes / sizeof(uint64_t));
std::iota(data.begin(), data.end(), 0);
std::shuffle(data.begin(), data.end(), std::mt19937{ std::random_device{}() });
return data;
}
long long option_1(std::size_t bytes)
{
std::vector<uint64_t> data = GenerateData(bytes);
auto startTime = std::chrono::high_resolution_clock::now();
auto myfile = std::fstream("file.binary", std::ios::out | std::ios::binary);
myfile.write((char*)&data[0], bytes);
myfile.close();
auto endTime = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
}
long long option_2(std::size_t bytes)
{
std::vector<uint64_t> data = GenerateData(bytes);
auto startTime = std::chrono::high_resolution_clock::now();
FILE* file = fopen("file.binary", "wb");
fwrite(&data[0], 1, bytes, file);
fclose(file);
auto endTime = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
}
long long option_3(std::size_t bytes)
{
std::vector<uint64_t> data = GenerateData(bytes);
std::ios_base::sync_with_stdio(false);
auto startTime = std::chrono::high_resolution_clock::now();
auto myfile = std::fstream("file.binary", std::ios::out | std::ios::binary);
myfile.write((char*)&data[0], bytes);
myfile.close();
auto endTime = std::chrono::high_resolution_clock::now();
return std::chrono::duration_cast<std::chrono::milliseconds>(endTime - startTime).count();
}
int main()
{
const std::size_t kB = 1024;
const std::size_t MB = 1024 * kB;
const std::size_t GB = 1024 * MB;
for (std::size_t size = 1 * MB; size <= 4 * GB; size *= 2) std::cout << "option1, " << size / MB << "MB: " << option_1(size) << "ms" << std::endl;
for (std::size_t size = 1 * MB; size <= 4 * GB; size *= 2) std::cout << "option2, " << size / MB << "MB: " << option_2(size) << "ms" << std::endl;
for (std::size_t size = 1 * MB; size <= 4 * GB; size *= 2) std::cout << "option3, " << size / MB << "MB: " << option_3(size) << "ms" << std::endl;
return 0;
}
This code compiles with Visual Studio 2017 and g++ 7.2.0 (a new requirements). I ran the code with two setups:
此代码使用 Visual Studio 2017 和 g++ 7.2.0(新要求)编译。我用两个设置运行代码:
- Laptop, Core i7, SSD, Ubuntu 16.04, g++ Version 7.2.0 with -std=c++11 -march=native -O3
- Desktop, Core i7, SSD, Windows 10, Visual Studio 2017 Version 15.3.1 with /Ox /Ob2 /Oi /Ot /GT /GL /Gy
- 笔记本电脑、Core i7、SSD、Ubuntu 16.04、g++ 7.2.0 版,带有 -std=c++11 -march=native -O3
- 台式机、Core i7、SSD、Windows 10、Visual Studio 2017 版本 15.3.1 带有 /Ox /Ob2 /Oi /Ot /GT /GL /Gy
Which gave the following measurements (after ditching the values for 1MB, because they were obvious outliers):
Both times option1 and option3 max out my SSD. I didn't expect this to see, because option2 used to be the fastest code on my old machine back then.
这给出了以下测量值(在
丢弃1MB 的值之后,因为它们是明显的异常值):
选项 1 和选项 3 都使我的 SSD 达到最大值。我没想到会看到这个,因为当时 option2 曾经是我旧机器上最快的代码。
TL;DR: My measurements indicate to use std::fstream
over FILE
.
TL;DR:我的测量表明使用std::fstream
超过FILE
.
回答by user541686
Try the following, in order:
请按顺序尝试以下操作:
Smaller buffer size. Writing ~2 MiB at a time might be a good start. On my last laptop, ~512 KiB was the sweet spot, but I haven't tested on my SSD yet.
Note:I've noticed that very large buffers tend to decreaseperformance. I've noticed speed losses with using 16-MiB buffers instead of 512-KiB buffers before.
Use
_open
(or_topen
if you want to be Windows-correct) to open the file, then use_write
. This will probablyavoid a lot of buffering, but it's not certain to.Using Windows-specific functions like
CreateFile
andWriteFile
. That will avoid any buffering in the standard library.
较小的缓冲区大小。一次写入约 2 MiB 可能是一个好的开始。在我的最后一台笔记本电脑上,~512 KiB 是最佳选择,但我还没有在我的 SSD 上测试过。
注意:我注意到非常大的缓冲区往往会降低性能。我之前注意到使用 16-MiB 缓冲区而不是 512-KiB 缓冲区会导致速度损失。
使用
_open
(或者_topen
如果您想要 Windows 正确)打开文件,然后使用_write
. 这可能会避免大量缓冲,但不确定。使用 Windows 特定的函数,如
CreateFile
和WriteFile
。这将避免标准库中的任何缓冲。
回答by Martin York
I see no difference between std::stream/FILE/device. Between buffering and non buffering.
我看不到 std::stream/FILE/device 之间的区别。在缓冲和非缓冲之间。
Also note:
另请注意:
- SSD drives "tend" to slow down (lower transfer rates) as they fill up.
- SSD drives "tend" to slow down (lower transfer rates) as they get older (because of non working bits).
- SSD 驱动器在装满时“趋于”减慢(较低的传输速率)。
- SSD 驱动器“趋于”变慢(较低的传输速率),因为它们变老(因为非工作位)。
I am seeing the code run in 63 secondds.
Thus a transfer rate of: 260M/s(my SSD look slightly faster than yours).
我看到代码在 63 秒内运行。
因此传输速率为:260M/s(我的 SSD 看起来比你的略快)。
64 * 1024 * 1024 * 8 /*sizeof(unsigned long long) */ * 32 /*Chunks*/
= 16G
= 16G/63 = 260M/s
I get a no increase by moving to FILE* from std::fstream.
从 std::fstream 移动到 FILE* 并没有增加。
#include <stdio.h>
using namespace std;
int main()
{
FILE* stream = fopen("binary", "w");
for(int loop=0;loop < 32;++loop)
{
fwrite(a, sizeof(unsigned long long), size, stream);
}
fclose(stream);
}
So the C++ stream are working as fast as the underlying library will allow.
因此,C++ 流的运行速度与底层库所允许的一样快。
But I think it is unfair comparing the OS to an application that is built on-top of the OS. The application can make no assumptions (it does not know the drives are SSD) and thus uses the file mechanisms of the OS for transfer.
但我认为将操作系统与构建在操作系统之上的应用程序进行比较是不公平的。应用程序不能做任何假设(它不知道驱动器是 SSD),因此使用操作系统的文件机制进行传输。
While the OS does not need to make any assumptions. It can tell the types of the drives involved and use the optimal technique for transferring the data. In this case a direct memory to memory transfer. Try writing a program that copies 80G from 1 location in memory to another and see how fast that is.
而操作系统不需要做任何假设。它可以分辨所涉及的驱动器类型并使用最佳技术来传输数据。在这种情况下,直接内存到内存传输。尝试编写一个程序,将 80G 从内存中的一个位置复制到另一个位置,看看它有多快。
Edit
编辑
I changed my code to use the lower level calls:
ie no buffering.
我更改了代码以使用较低级别的调用:
即没有缓冲。
#include <fcntl.h>
#include <unistd.h>
const unsigned long long size = 64ULL*1024ULL*1024ULL;
unsigned long long a[size];
int main()
{
int data = open("test", O_WRONLY | O_CREAT, 0777);
for(int loop = 0; loop < 32; ++loop)
{
write(data, a, size * sizeof(unsigned long long));
}
close(data);
}
This made no diffference.
这没有任何区别。
NOTE: My drive is an SSD drive if you have a normal drive you may see a difference between the two techniques above. But as I expected non buffering and buffering (when writting large chunks greater than buffer size) make no difference.
注意:我的驱动器是 SSD 驱动器,如果您使用的是普通驱动器,您可能会发现上述两种技术之间存在差异。但是正如我预期的那样,非缓冲和缓冲(写入大于缓冲区大小的大块时)没有区别。
Edit 2:
编辑2:
Have you tried the fastest method of copying files in C++
你有没有试过C++中复制文件的最快方法
int main()
{
std::ifstream input("input");
std::ofstream output("ouptut");
output << input.rdbuf();
}
回答by HandMadeOX
The best solution is to implement an async writing with double buffering.
最好的解决方案是使用双缓冲实现异步写入。
Look at the time line:
看时间线:
------------------------------------------------>
FF|WWWWWWWW|FF|WWWWWWWW|FF|WWWWWWWW|FF|WWWWWWWW|
The 'F' represents time for buffer filling, and 'W' represents time for writing buffer to disk. So the problem in wasting time between writing buffers to file. However, by implementing writing on a separate thread, you can start filling the next buffer right away like this:
“F”代表缓冲区填充时间,“W”代表将缓冲区写入磁盘的时间。所以在将缓冲区写入文件之间浪费时间的问题。但是,通过在单独的线程上实现写入,您可以立即开始填充下一个缓冲区,如下所示:
------------------------------------------------> (main thread, fills buffers)
FF|ff______|FF______|ff______|________|
------------------------------------------------> (writer thread)
|WWWWWWWW|wwwwwwww|WWWWWWWW|wwwwwwww|
F - filling 1st buffer
f - filling 2nd buffer
W - writing 1st buffer to file
w - writing 2nd buffer to file
_ - wait while operation is completed
F - 填充第一个缓冲区
f - 填充第二个缓冲区
W - 将第一个缓冲区写入文件
w - 将第二个缓冲区写入文件
_ - 等待操作完成
This approach with buffer swaps is very useful when filling a buffer requires more complex computation (hence, more time). I always implement a CSequentialStreamWriter class that hides asynchronous writing inside, so for the end-user the interface has just Write function(s).
当填充缓冲区需要更复杂的计算(因此需要更多时间)时,这种缓冲区交换方法非常有用。我总是实现一个隐藏异步写入的 CSequentialStreamWriter 类,所以对于最终用户来说,接口只有 Write 函数。
And the buffer size must be a multiple of disk cluster size. Otherwise, you'll end up with poor performance by writing a single buffer to 2 adjacent disk clusters.
并且缓冲区大小必须是磁盘簇大小的倍数。否则,将单个缓冲区写入 2 个相邻的磁盘集群,最终会导致性能不佳。
Writing the last buffer.
When you call Write function for the last time, you have to make sure that the current buffer is being filled should be written to disk as well. Thus CSequentialStreamWriter should have a separate method, let's say Finalize (final buffer flush), which should write to disk the last portion of data.
写入最后一个缓冲区。
最后一次调用 Write 函数时,必须确保当前正在填充的缓冲区也应写入磁盘。因此 CSequentialStreamWriter 应该有一个单独的方法,比如说 Finalize(最终缓冲区刷新),它应该将数据的最后一部分写入磁盘。
Error handling.
While the code start filling 2nd buffer, and the 1st one is being written on a separate thread, but write fails for some reason, the main thread should be aware of that failure.
错误处理。
虽然代码开始填充第二个缓冲区,并且第一个缓冲区正在单独的线程上写入,但由于某种原因写入失败,主线程应该知道该失败。
------------------------------------------------> (main thread, fills buffers)
FF|fX|
------------------------------------------------> (writer thread)
__|X|
Let's assume the interface of a CSequentialStreamWriter has Write function returns bool or throws an exception, thus having an error on a separate thread, you have to remember that state, so next time you call Write or Finilize on the main thread, the method will return False or will throw an exception. And it does not really matter at which point you stopped filling a buffer, even if you wrote some data ahead after the failure - most likely the file would be corrupted and useless.
假设一个CSequentialStreamWriter的接口有Write函数返回bool或者抛出异常,因此在一个单独的线程上有错误,你必须记住那个状态,所以下次你在主线程上调用Write或Finilize时,该方法将返回错误或将抛出异常。即使您在失败后提前写入了一些数据,您在哪一点停止填充缓冲区也无关紧要 - 文件很可能已损坏且无用。
回答by Ralph
I'd suggest trying file mapping. I used mmap
in the past, in a UNIX environment, and I was impressed by the high performance I could achieve
我建议尝试文件映射。我mmap
过去在 UNIX 环境中使用过,我对我可以实现的高性能印象深刻
回答by cybertextron
Could you use FILE*
instead, and the measure the performance you've gained?
A couple of options is to use fwrite/write
instead of fstream
:
你能用它FILE*
来衡量你获得的性能吗?有几个选项是使用fwrite/write
而不是fstream
:
#include <stdio.h>
int main ()
{
FILE * pFile;
char buffer[] = { 'x' , 'y' , 'z' };
pFile = fopen ( "myfile.bin" , "w+b" );
fwrite (buffer , 1 , sizeof(buffer) , pFile );
fclose (pFile);
return 0;
}
If you decide to use write
, try something similar:
如果您决定使用write
,请尝试类似的操作:
#include <unistd.h>
#include <fcntl.h>
int main(void)
{
int filedesc = open("testfile.txt", O_WRONLY | O_APPEND);
if (filedesc < 0) {
return -1;
}
if (write(filedesc, "This will be output to testfile.txt\n", 36) != 36) {
write(2, "There was an error writing to testfile.txt\n", 43);
return -1;
}
return 0;
}
I would also advice you to look into memory map
. That may be your answer. Once I had to process a 20GB file in other to store it in the database, and the file as not even opening. So the solution as to utilize moemory map. I did that in Python
though.
我也建议你研究一下memory map
。这可能就是你的答案。一旦我不得不在其他地方处理一个 20GB 的文件以将其存储在数据库中,并且该文件甚至没有打开。所以解决方案是利用内存映射。我这样做了Python
。
回答by Viktor Latypov
Try using open()/write()/close() API calls and experiment with the output buffer size. I mean do not pass the whole "many-many-bytes" buffer at once, do a couple of writes (i.e., TotalNumBytes / OutBufferSize). OutBufferSize can be from 4096 bytes to megabyte.
尝试使用 open()/write()/close() API 调用并试验输出缓冲区大小。我的意思是不要一次传递整个“多字节”缓冲区,而是进行几次写入(即 TotalNumBytes / OutBufferSize)。OutBufferSize 可以是 4096 字节到兆字节。
Another try - use WinAPI OpenFile/CreateFile and use this MSDN articleto turn off buffering (FILE_FLAG_NO_BUFFERING). And this MSDN article on WriteFile()shows how to get the block size for the drive to know the optimal buffer size.
另一个尝试 - 使用 WinAPI OpenFile/CreateFile 并使用这篇 MSDN 文章关闭缓冲 (FILE_FLAG_NO_BUFFERING)。而在这个WriteFile的MSDN文章()演示了如何获取的块大小的驱动器知道最佳的缓冲区大小。
Anyway, std::ofstream is a wrapper and there might be blocking on I/O operations. Keep in mind that traversing the entire N-gigabyte array also takes some time. While you are writing a small buffer, it gets to the cache and works faster.
无论如何, std::ofstream 是一个包装器,可能会阻塞 I/O 操作。请记住,遍历整个 N-GB 阵列也需要一些时间。当您写入一个小缓冲区时,它会进入缓存并更快地工作。
回答by rustyx
fstream
s are not slower than C streams, per se, but they use more CPU(especially if buffering is not properly configured). When a CPU saturates, it limits the I/O rate.
fstream
s 本身并不比 C 流慢,但它们使用更多的 CPU(特别是如果缓冲没有正确配置)。当 CPU 饱和时,它会限制 I/O 速率。
At least the MSVC 2015 implementation copies 1 char at a timeto the output buffer when a stream buffer is not set (see streambuf::xsputn
). So make sure to set a stream buffer (>0).
当未设置流缓冲区时,至少 MSVC 2015 实现一次将1 个字符复制到输出缓冲区(请参阅 参考资料streambuf::xsputn
)。所以一定要设置一个流缓冲区 (>0)。
I can get a write speed of 1500MB/s (the full speed of my M.2 SSD) with fstream
using this code:
fstream
使用以下代码,我可以获得 1500MB/s 的写入速度(我的 M.2 SSD 的全速):
#include <iostream>
#include <fstream>
#include <chrono>
#include <memory>
#include <stdio.h>
#ifdef __linux__
#include <unistd.h>
#endif
using namespace std;
using namespace std::chrono;
const size_t sz = 512 * 1024 * 1024;
const int numiter = 20;
const size_t bufsize = 1024 * 1024;
int main(int argc, char**argv)
{
unique_ptr<char[]> data(new char[sz]);
unique_ptr<char[]> buf(new char[bufsize]);
for (size_t p = 0; p < sz; p += 16) {
memcpy(&data[p], "BINARY.DATA.....", 16);
}
unlink("file.binary");
int64_t total = 0;
if (argc < 2 || strcmp(argv[1], "fopen") != 0) {
cout << "fstream mode\n";
ofstream myfile("file.binary", ios::out | ios::binary);
if (!myfile) {
cerr << "open failed\n"; return 1;
}
myfile.rdbuf()->pubsetbuf(buf.get(), bufsize); // IMPORTANT
for (int i = 0; i < numiter; ++i) {
auto tm1 = high_resolution_clock::now();
myfile.write(data.get(), sz);
if (!myfile)
cerr << "write failed\n";
auto tm = (duration_cast<milliseconds>(high_resolution_clock::now() - tm1).count());
cout << tm << " ms\n";
total += tm;
}
myfile.close();
}
else {
cout << "fopen mode\n";
FILE* pFile = fopen("file.binary", "wb");
if (!pFile) {
cerr << "open failed\n"; return 1;
}
setvbuf(pFile, buf.get(), _IOFBF, bufsize); // NOT important
auto tm1 = high_resolution_clock::now();
for (int i = 0; i < numiter; ++i) {
auto tm1 = high_resolution_clock::now();
if (fwrite(data.get(), sz, 1, pFile) != 1)
cerr << "write failed\n";
auto tm = (duration_cast<milliseconds>(high_resolution_clock::now() - tm1).count());
cout << tm << " ms\n";
total += tm;
}
fclose(pFile);
auto tm2 = high_resolution_clock::now();
}
cout << "Total: " << total << " ms, " << (sz*numiter * 1000 / (1024.0 * 1024 * total)) << " MB/s\n";
}
I tried this code on other platforms (Ubuntu, FreeBSD) and noticed no I/O rate differences, but a CPU usagedifference of about 8:1 (fstream
used 8 times more CPU). So one can imagine, had I a faster disk, the fstream
write would slow down sooner than the stdio
version.
我在其他平台(Ubuntu、FreeBSD)上尝试了这段代码,发现没有 I/O 速率差异,但CPU 使用率差异约为 8:1(fstream
使用了8 倍的 CPU)。所以可以想象,如果我有一个更快的磁盘,fstream
写入速度会比stdio
版本更慢。
回答by qehgt
Try to use memory-mapped files.
尝试使用内存映射文件。
回答by dualed
If you copy something from disk A to disk B in explorer, Windows employs DMA. That means for most of the copy process, the CPU will basically do nothing other than telling the disk controller where to put, and get data from, eliminating a whole step in the chain, and one that is not at all optimized for moving large amounts of data - and I mean hardware.
如果您在资源管理器中将某些内容从磁盘 A 复制到磁盘 B,Windows 将使用 DMA。这意味着对于大多数复制过程,除了告诉磁盘控制器将数据放在哪里和从哪里获取数据之外,CPU 基本上不会做任何事情,从而消除链中的整个步骤,并且根本没有针对移动大量数据进行优化数据 - 我的意思是硬件。
What youdo involves the CPU a lot. I want to point you to the "Some calculations to fill a[]" part. Which I think is essential. You generate a[], then you copy from a[] to an output buffer (thats what fstream::write does), then you generate again, etc.
什么你做涉及到CPU很多。我想向您指出“一些计算来填充 []”部分。我认为这是必不可少的。您生成一个 [],然后从 a[] 复制到输出缓冲区(这就是 fstream::write 所做的),然后再次生成,等等。
What to do? Multithreading! (I hope you have a multi-core processor)
该怎么办?多线程!(希望你有一个多核处理器)
- fork.
- Use one thread to generate a[] data
- Use the other to write data from a[] to disk
- You will need two arrays a1[] and a2[] and switch between them
- You will need some sort of synchronization between your threads (semaphores, message queue, etc.)
- Use lower level, unbuffered, functions, like the the WriteFilefunction mentioned by Mehrdad
- 叉子。
- 使用一个线程生成一个[]数据
- 使用另一个将数据从 a[] 写入磁盘
- 您将需要两个数组 a1[] 和 a2[] 并在它们之间切换
- 您将需要在线程(信号量、消息队列等)之间进行某种同步。
- 使用较低级别的无缓冲函数,例如Mehrdad 提到的WriteFile函数