C++ 如何在C++中读取大文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34751873/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read huge file in c++
提问by ZigZagZebra
If I have a huge file (eg. 1TB, or any size that does not fit into RAM. The file is stored on the disk). It is delimited by space. And my RAM is only 8GB. Can I read that file in ifstream? If not, how to read a block of file (eg. 4GB)?
如果我有一个巨大的文件(例如 1TB,或任何不适合 RAM 的大小。文件存储在磁盘上)。它由空格分隔。而我的内存只有 8GB。我可以在 ifstream 中读取该文件吗?如果没有,如何读取一个文件块(例如 4GB)?
回答by zneak
There are a couple of things that you can do.
您可以做几件事。
First, there's no problem opening a file that is larger than the amount of RAM that you have. What you won't be able to do is copy the whole file liveinto your memory. The best thing would be for you to find a way to read just a few chunks at a time and process them. You can use ifstream
for that purpose (with ifstream.read
, for instance). Allocate, say, one megabyte of memory, read the first megabyte of that file into it, rinse and repeat:
首先,打开一个大于您拥有的 RAM 量的文件没有问题。您无法做的是将整个文件实时复制到您的内存中。最好的办法是找到一种方法一次只读取几个块并处理它们。您可以ifstream
为此目的使用(ifstream.read
例如,使用 )。分配,比如说,一兆字节的内存,将该文件的第一兆字节读入其中,冲洗并重复:
ifstream bigFile("mybigfile.dat");
constexpr size_t bufferSize = 1024 * 1024;
unique_ptr<char[]> buffer(new char[bufferSize]);
while (bigFile)
{
bigFile.read(buffer.get(), bufferSize);
// process data in buffer
}
Another solution is to map the file to memory. Most operating systems will allow you to map a file to memory even if it is larger than the physical amount of memory that you have. This works because the operating system knows that each memory page associated with the file can be mapped and unmapped on-demand: when your program needs a specific page, the OS will read it from the file into your process's memory and swap out a page that hasn't been used in a while.
另一种解决方案是将文件映射到内存。大多数操作系统都允许您将文件映射到内存,即使它大于您拥有的物理内存量。这是有效的,因为操作系统知道与文件关联的每个内存页面都可以按需映射和取消映射:当您的程序需要特定页面时,操作系统会将其从文件中读取到您的进程的内存中并换出一个页面有一段时间没用了。
However, this can only work if the file is smaller than the maximum amount of memory that your process can theoretically use. This isn't an issue with a 1TB file in a 64-bit process, but it wouldn't work in a 32-bit process.
但是,这仅在文件小于进程理论上可以使用的最大内存量时才有效。这对于 64 位进程中的 1TB 文件不是问题,但它在 32 位进程中不起作用。
Also be aware of the spirits that you're summoning. Memory-mapping a file is not the same thing as reading from it. If the file is suddenly truncated from another program, your program is likely to crash. If you modify the data, it's possible that you will run out of memory if you can't save back to the disk. Also, your operating system's algorithm for paging in and out memory may not behave in a way that advantages you significantly. Because of these uncertainties, I would consider mapping the file only if reading it in chunks using the first solution cannot work.
还要注意你正在召唤的灵魂。对文件进行内存映射与读取文件不同。如果文件突然从另一个程序中截断,您的程序很可能会崩溃。如果修改数据,如果无法保存回磁盘,则可能会耗尽内存。此外,您的操作系统的内存调入和调出算法可能不会以显着优势的方式运行。由于这些不确定性,只有在使用第一个解决方案无法正常读取文件时,我才会考虑映射文件。
On Linux/OS X, you would use mmap
for it. On Windows, you would open a file and then use CreateFileMapping
then MapViewOfFile
.
在 Linux/OS X 上,你会使用mmap
它。在 Windows 上,您将打开一个文件,然后使用CreateFileMapping
then MapViewOfFile
。
回答by Oleg Andriyanov
I am sure you don't have to keep all the file in memory. Typically one wants to read and process file by chunks. If you want to use ifstream
, you can do something like that:
我相信您不必将所有文件都保存在内存中。通常,人们希望按块读取和处理文件。如果你想使用ifstream
,你可以这样做:
ifstream is("/path/to/file");
char buf[4096];
do {
is.read(buf, sizeof(buf));
process_chunk(buf, is.gcount());
} while(is);
回答by marcinj
A more advances aproach is to instead of reading whole file or its chunks to memory you can map it to memory using platform specific apis:
更先进的方法是,您可以使用特定于平台的 apis 将其映射到内存,而不是将整个文件或其块读取到内存中:
Under windows: CreateFileMapping(), MapViewOfFile()
windows下:CreateFileMapping(), MapViewOfFile()
Under linux: open(2) / creat(2), shm_open, mmap
linux下:open(2) / creat(2), shm_open, mmap
you will need to compile 64bit app to make it work.
您需要编译 64 位应用程序才能使其工作。
for more details see here: CreateFileMapping, MapViewOfFile, how to avoid holding up the system memory
有关更多详细信息,请参见此处:CreateFileMapping、MapViewOfFile、如何避免占用系统内存
回答by marian0
You can use fread
你可以使用fread
char buffer[size];
fread(buffer, size, sizeof(char), fp);
Or, if you want to use C++ fstreams you can use readas buratinosaid.
或者,如果你想用C ++ fstreams您可以使用阅读如buratino说。
Also have in mind that you can open a file regardless of its size, the idea is to open it and read it in chucks that fit in your RAM.
还要记住,无论文件大小如何,您都可以打开文件,这个想法是打开它并在适合您的 RAM 的卡盘中读取它。