C语言 Mmap() 整个大文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7222164/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 09:31:15  来源:igfitidea点击:

Mmap() an entire large file

cmmap

提问by Emer

I am trying to "mmap" a binary file (~ 8Gb) using the following code (test.c).

我正在尝试使用以下代码(test.c)“映射”一个二进制文件(~ 8Gb)。

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define handle_error(msg) \
  do { perror(msg); exit(EXIT_FAILURE); } while (0)

int main(int argc, char *argv[])
{
   const char *memblock;
   int fd;
   struct stat sb;

   fd = open(argv[1], O_RDONLY);
   fstat(fd, &sb);
   printf("Size: %lu\n", (uint64_t)sb.st_size);

   memblock = mmap(NULL, sb.st_size, PROT_WRITE, MAP_PRIVATE, fd, 0);
   if (memblock == MAP_FAILED) handle_error("mmap");

   for(uint64_t i = 0; i < 10; i++)
   {
     printf("[%lu]=%X ", i, memblock[i]);
   }
   printf("\n");
   return 0;
}

test.c is compiled using gcc -std=c99 test.c -o testand fileof test returns: test: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped

test.c 使用gcc -std=c99 test.c -o testfile测试返回编译:test: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.15, not stripped

Although this works fine for small files, I get a segmentation fault when I try to load a big one. The program actually returns:

虽然这对小文件很有效,但当我尝试加载大文件时,我会遇到分段错误。该程序实际上返回:

Size: 8274324021 
mmap: Cannot allocate memory

I managed to map the whole file using boost::iostreams::mapped_file but I want to do it using C and system calls. What is wrong with my code?

我设法使用 boost::iostreams::mapped_file 映射整个文件,但我想使用 C 和系统调用来完成它。我的代码有什么问题?

采纳答案by bdonlan

MAP_PRIVATEmappings require a memory reservation, as writing to these pages may result in copy-on-write allocations. This means that you can't map something too much larger than your physical ram + swap. Try using a MAP_SHAREDmapping instead. This means that writes to the mapping will be reflected on disk - as such, the kernel knows it can always free up memory by doing writeback, so it won't limit you.

MAP_PRIVATE映射需要内存预留,因为写入这些页面可能会导致写入时复制分配。这意味着您不能映射比物理内存 + 交换区大太多的东西。尝试改用MAP_SHARED映射。这意味着对映射的写入将反映在磁盘上 - 因此,内核知道它始终可以通过回写来释放内存,因此它不会限制您。

I also note that you're mapping with PROT_WRITE, but you then go on and read from the memory mapping. You also opened the file with O_RDONLY- this itself may be another problem for you; you must specify O_RDWRif you want to use PROT_WRITEwith MAP_SHARED.

我还注意到您正在使用 进行映射PROT_WRITE,但您随后继续从内存映射中读取数据。您还打开了文件O_RDONLY- 这本身可能对您来说是另一个问题;您必须指定O_RDWR是否要使用PROT_WRITEwith MAP_SHARED

As for PROT_WRITEonly, this happens to work on x86, because x86 doesn't support write-only mappings, but may cause segfaults on other platforms. Request PROT_READ|PROT_WRITE- or, if you only need to read, PROT_READ.

至于PROT_WRITEonly,这恰好适用于 x86,因为 x86 不支持只写映射,但在其他平台上可能会导致段错误。请求PROT_READ|PROT_WRITE- 或者,如果您只需要阅读,PROT_READ.

On my system (VPS with 676MB RAM, 256MB swap), I reproduced your problem; changing to MAP_SHAREDresults in an EPERMerror (since I'm not allowed to write to the backing file opened with O_RDONLY). Changing to PROT_READand MAP_SHAREDallows the mapping to succeed.

在我的系统(具有 676MB RAM、256MB 交换空间的 VPS)上,我重现了您的问题;更改为会MAP_SHARED导致EPERM错误(因为我不允许写入以O_RDONLY.打开的后备文件)。更改为PROT_READMAP_SHARED允许映射成功。

If you need to modify bytes in the file, one option would be to make private just the ranges of the file you're going to write to. That is, munmapand remap with MAP_PRIVATEthe areas you intend to write to. Of course, if you intend to write to the entire filethen you need 8GB of memory to do so.

如果您需要修改文件中的字节,一种选择是将您要写入的文件的范围设为私有。也就是说,munmap并重新映射MAP_PRIVATE您打算写入的区域。当然,如果您打算写入整个文件,那么您需要 8GB 的​​内存才能这样做。

Alternately, you can write 1to /proc/sys/vm/overcommit_memory. This will allow the mapping request to succeed; however, keep in mind that if you actually try to use the full 8GB of COW memory, your program (or some other program!) will be killed by the OOM killer.

或者,你可以写1/proc/sys/vm/overcommit_memory。这将允许映射请求成功;但是,请记住,如果您真的尝试使用完整的 8GB COW 内存,您的程序(或其他一些程序!)将被 OOM 杀手杀死。

回答by dcoles

Linux (and apparently a few other UNIX systems) have the MAP_NORESERVEflag for mmap(2), which can be used to explicitly enable swap space overcommitting. This can be useful when you wish to map a file larger than the amount of free memory available on your system.

Linux(显然还有一些其他 UNIX 系统)具有mmap(2)MAP_NORESERVE标志,可用于显式启用交换空间过度使用。当您希望映射的文件大于系统上可用的可用内存量时,这会很有用。

This is particularly handy when used with MAP_PRIVATEand only intend to write to a small portion of the memory mapped range, since this would otherwise trigger swap space reservation of the entire file (or cause the system to return ENOMEM, if system wide overcommitting hasn't been enabled and you exceed the free memory of the system).

这在与MAP_PRIVATE仅打算写入内存映射范围的一小部分一起使用时特别方便,因为否则这将触发整个文件的交换空间保留(或导致系统返回ENOMEM,如果系统范围的过度使用尚未发生)已启用并且您超出了系统的可用内存)。

The issue to watch out for is that if you do write to a large portion of this memory, the lazy swap space reservation may cause your application to consume all the free RAM and swap on the system, eventually triggering the OOM killer (Linux) or causing your app to receive a SIGSEGV.

需要注意的问题是,如果您确实写入了该内存的很大一部分,则延迟交换空间保留可能会导致您的应用程序消耗系统上的所有空闲 RAM 和交换,最终触发 OOM 杀手 (Linux) 或导致您的应用收到SIGSEGV.

回答by Mat

You don't have enough virtual memory to handle that mapping.

您没有足够的虚拟内存来处理该映射。

As an example, I have a machine here with 8G RAM, and ~8G swap (so 16G total virtual memory available).

例如,我这里有一台机器有 8G RAM 和 ~8G 交换(因此总共有 16G 可用虚拟内存)。

If I run your code on a VirtualBox snapshot that is ~8G, it works fine:

如果我在 ~8G 的 VirtualBox 快照上运行你的代码,它工作正常:

$ ls -lh /media/vms/.../snap.vdi
-rw------- 1 me users 9.2G Aug  6 16:02 /media/vms/.../snap.vdi
$ ./a.out /media/vms/.../snap.vdi
Size: 9820000256 
[0]=3C [1]=3C [2]=3C [3]=20 [4]=4F [5]=72 [6]=61 [7]=63 [8]=6C [9]=65 

Now, if I drop the swap, I'm left with 8G total memory. (Don'trun this on an active server.) And the result is:

现在,如果我放弃交换,我只剩下 8G 的总内存。(不要在活动服务器上运行它。)结果是:

$ sudo swapoff -a
$ ./a.out /media/vms/.../snap.vdi
Size: 9820000256 
mmap: Cannot allocate memory

So make sure you have enough virtual memory to hold that mapping (even if you only touch a few pages in that file).

因此,请确保您有足够的虚拟内存来保存该映射(即使您只触及该文件中的几页)。