windows 将内存映射的数据块读入结构中
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12790820/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading a memory mapped block of data into a structure
提问by foboi1122
I've been playing around with memory mapping today on VC++ 2008 and I still haven't completely understood how to use it or if it's correct for my purposes. My goal here is to quickly read a very large binary file.
我今天一直在 VC++ 2008 上玩内存映射,但我仍然没有完全理解如何使用它或者它是否适合我的目的。我的目标是快速读取一个非常大的二进制文件。
I have a struct:
我有一个结构:
typedef struct _data
{
int number;
char character[512];
float *entries;
}Data;
which is written many many times into a file. the "entries" variable is an array of floating point decimals. After writing this file (10000 Data structs with each "entries" array being 90000 floats), I tried to memory map this file with the following function so that I could read the data faster. Here's what I have so far:
它被多次写入一个文件。“entries”变量是一个浮点小数数组。写入此文件后(10000 个数据结构,每个“条目”数组为 90000 个浮点数),我尝试使用以下函数对该文件进行内存映射,以便更快地读取数据。这是我到目前为止所拥有的:
void readDataMmap(char *fname, //name of file containing my data
int arraySize, //number of values in struct Data
int entrySize) //number of values in each "entries" array
{
//Read and mem map the file
HANDLE hFile = INVALID_HANDLE_VALUE;
HANDLE hMapFile;
char* pBuf;
int fd = open(fname, O_RDONLY);
if(fd == -1){
printf("Error: read failed");
exit(-1);
}
hFile = CreateFile((TCHAR*)fname,
GENERIC_READ, // open for reading
0, // do not share
NULL, // default security
OPEN_EXISTING, // existing file only
FILE_ATTRIBUTE_NORMAL, // normal file
NULL); // no template
if (hFile == INVALID_HANDLE_VALUE)
{
printf("First CreateFile failed"));
return (1);
}
hMapFile = CreateFileMapping(hFile,
NULL, // default security
PAGE_READWRITE,
0, // max. object size
0, // buffer size
NULL); // name of mapping object
if(hMapFile == ERROR_FILE_INVALID){
printf("File Mapping failed");
return(2);
}
pBuf = (char*) MapViewOfFile(hMapFile, // handle to map object
FILE_MAP_READ, // read/write permission
0,
0,
0); //Was NULL, 0 should represent full file bytesToMap size
if (pBuf == NULL)
{
printf("Could not map view of file\n");
CloseHandle(hMapFile);
return 1;
}
//Allocate data structure
Data *inData = new Data[arraySize];
for(int i = 0; i<arraySize; i++)inData[i].entries = new float[entrySize];
int pos = 0;
for(int i = 0; i < arraySize; i++)
{
//This is where I'm not sure what to do with the memory block
}
}
At the end of the function, after the memory is mapped and I'm returned a pointer to the beginning of the memory block "pBuf", I don't know what to do to be able to read this memory block back into my data structure. So eventually I would like to transfer this block of memory back into my array of 10000 Data struct entries. Ofcourse, I could be doing this completely wrong...
在函数结束时,在映射内存并返回指向内存块“pBuf”开头的指针后,我不知道该怎么做才能将这个内存块读回我的数据结构体。所以最终我想把这块内存传输回我的 10000 个数据结构条目的数组。当然,我这样做可能完全错误......
回答by devshorts
Dealing with a memory mapped file is really no different than dealing with any other kind of pointer to memory. The memory mapped file is just a block of data that you can read and write to from any process using the same name.
处理内存映射文件实际上与处理任何其他类型的内存指针没有什么不同。内存映射文件只是一个数据块,您可以使用相同的名称从任何进程读取和写入。
I'm assuming you want to load the file into a memory map and then read and update it at will there and dump it to a file at some regular or known interval right? If that's the case then just read from the file and copy the data to the memory map pointer and that's it. Later you can read data from the map and cast it into your memory aligned structure and use your structure at will.
我假设您想将文件加载到内存映射中,然后在那里随意读取和更新它,并以某个固定或已知的时间间隔将其转储到文件中,对吗?如果是这种情况,那么只需从文件中读取数据并将数据复制到内存映射指针即可。稍后您可以从地图中读取数据并将其转换为您的内存对齐结构并随意使用您的结构。
If I was you I'd probably create a few helper methods like
如果我是你,我可能会创建一些辅助方法,例如
data ReadData(void *ptr)
data ReadData(void *ptr)
and
和
void WriteData(data *ptrToData, void *ptr)
void WriteData(data *ptrToData, void *ptr)
Where *ptris the memory map address and *ptrToDatais a pointer to your data structure to write to memory. Really at this point it doesn't matter if its memory mapped or not, if you wanted to read from the file loaded into local memory you could do that too.
哪里*ptr是内存映射地址,*ptrToData是指向要写入内存的数据结构的指针。真的在这一点上,它的内存是否映射并不重要,如果你想从加载到本地内存的文件中读取你也可以这样做。
You can read/write to it the same exact way you would with any other block data using memcpy to copy data from the source to the target and you can use pointer arithmetic to advance the location in the data. Don't worry about the "memory map", its just a pointer to memory and you can treat it as such.
您可以以与使用 memcpy 将数据从源复制到目标的任何其他块数据完全相同的方式读取/写入它,并且您可以使用指针算法来推进数据中的位置。不要担心“内存映射”,它只是一个指向内存的指针,你可以这样对待它。
Also, since you are going to be dealing with direct memory pointers you don't need to write each element into mapped file one by one, you can write them all in one batch like
此外,由于您将处理直接内存指针,因此您无需将每个元素一一写入映射文件,您可以将它们全部写入一批,例如
memcpy(mapPointer, data->entries, sizeof(float)*number)
memcpy(mapPointer, data->entries, sizeof(float)*number)
Which copies float*entries size from data->entriesinto the map pointer start address. Obviously you can copy it however you want and wherever you want, this is just an example. See http://www.devx.com/tips/Tip/13291.
它将 float*entries 大小复制data->entries到映射指针起始地址中。显然,您可以随心所欲地复制它,随心所欲,这只是一个示例。请参阅http://www.devx.com/tips/Tip/13291。
To read the data back in what you would do is something similar, but you want to explicity copy memory addresses to a known location, so imagine flattening your structure out. Instead of
以类似的方式读回数据,但您想明确地将内存地址复制到已知位置,因此想象一下将您的结构展平。代替
data:
int
char * -> points to some address
float * -> points to some address
Where your pointers point to other memory elsewhere, copy the memory like this
在您的指针指向其他地方的其他内存的地方,像这样复制内存
data:
int
char * -> copy of original ptr
float * -> copy of original ptr
512 values of char array
number of values of float array
So this way you can "re-serialize" the data from the memory map to your local. Remember, array's are just pointers to memory. The memory doesn't have to be sequential in the object since it could have been allocated at another time. You need to make sure to copy the actual data the pointers are pointing to to your memory map. A common way of doing this is to write the object straight into the memory map, then follow the object with all the flattened arrays. Reading it back in you first read the object, then increment the pointer by sizeof(object)and read in the next array, then increment the pointer again by arraysizeetc.
因此,您可以通过这种方式将数据从内存映射“重新序列化”到本地。请记住,数组只是指向内存的指针。对象中的内存不必是连续的,因为它可以在其他时间分配。您需要确保将指针指向的实际数据复制到您的内存映射。执行此操作的一种常见方法是将对象直接写入内存映射,然后使用所有扁平数组跟随对象。读回它首先读取对象,然后将指针递增sizeof(object)并读入下一个数组,然后将指针再次递增arraysize等。
Here is an example:
下面是一个例子:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct data{
int size;
char items[512];
float * dataPoints;
};
void writeToBuffer(data *input, char *buffer){
int sizeOfData = sizeof(data);
int dataPointsSize = sizeof(float) * input->size;
printf("size of data %d\n", sizeOfData);
memcpy(buffer, input, sizeOfData);
printf("pointer to dataPoints of original %x\n", input->dataPoints);
memcpy(buffer + sizeOfData, input->dataPoints, dataPointsSize);
}
void readFromBuffer(data *target, char * buffer){
memcpy(target, buffer, sizeof(data));
printf("pointer to datapoints of copy %x, same as original\n", target->dataPoints);
// give ourselves a new array
target->dataPoints = (float *)malloc(target->size * sizeof(float));
// do a deep copy, since we just copied the same pointer from
// the previous data into our local
memcpy(target->dataPoints, buffer + sizeof(data), target->size * sizeof(float));
printf("pointer to datapoints of copy %x, now it's own copy\n", target->dataPoints);
}
int main(int argc, char* argv[])
{
data test;
for(unsigned int i=0;i<512;i++){
test.items[i] = i;
}
test.size = 10;
// create an array and populate the data
test.dataPoints = new float[test.size];
for(unsigned int i=0;i<test.size;i++){
test.dataPoints[i] = (float)i * (1000.0);
}
// print it out for demosntration
for(unsigned int i=0;i<test.size;i++){
printf("data point value %d: %f\n", i, test.dataPoints[i]);
}
// create a memory buffer. this is no different than the shared memory
char * memBuffer = (char*)malloc(sizeof(data) + 512 + sizeof(float) * test.size + 200);
// create a target we'll load values into
data test2;
// write the original out to the memory buffer
writeToBuffer(&test, memBuffer);
// read from the memory buffer into the target
readFromBuffer(&test2, memBuffer);
// print for demonstration
printf("copy number %d\n", test2.size);
for(int i=0;i<test2.size;i++){
printf("\tcopy value %d: %f\n", i, test2.dataPoints[i]);
}
// memory cleanup
delete memBuffer;
delete [] test.dataPoints;
return 0;
}
You'll probably also want to read up on data alignment when writing data from a struct to memory. Check working with packing structures, C++ struct alignment question, and data structure alignment.
在将数据从结构写入内存时,您可能还想了解数据对齐情况。检查使用打包结构、C++ 结构对齐问题和数据结构对齐。
If you don't know the size of the data ahead of time when reading you should write the size of the data into a known position in the beginning of the memory map for later use.
如果您在读取时不知道数据的大小,则应将数据的大小写入内存映射开头的已知位置,以备后用。
Anyways, to address the fact of whether its right or not to use it here I think it is. From wikipedia
无论如何,要解决在这里使用它是否正确的事实,我认为是这样。来自维基百科
The primary benefit of memory mapping a file is increasing I/O performance, especially when used on large files. ... The memory mapping process is handled by the virtual memory manager, which is the same subsystem responsible for dealing with the page file. Memory mapped files are loaded into memory one entire page at a time. The page size is selected by the operating system for maximum performance. Since page file management is one of the most critical elements of a virtual memory system, loading page sized sections of a file into physical memory is typically a very highly optimized system function.
内存映射文件的主要好处是提高 I/O 性能,尤其是在用于大文件时。...内存映射过程由虚拟内存管理器处理,它是负责处理页面文件的同一个子系统。内存映射文件一次一整页加载到内存中。操作系统选择页面大小以获得最大性能。由于页面文件管理是虚拟内存系统中最关键的元素之一,因此将文件的页面大小的部分加载到物理内存中通常是一项高度优化的系统功能。
You're going to load the whole thing into virtual memory and then the OS can page the file in and out of memory for you as you need it, creating a "lazy loading" mechanism.
您要将整个内容加载到虚拟内存中,然后操作系统可以根据需要为您将文件分页进出内存,从而创建“延迟加载”机制。
All that said, memory maps are shared, so if its across process boundaries you'll want to synchronize them with a named mutex so you don't overwrite data between processes.
总而言之,内存映射是共享的,因此如果它跨进程边界,您将希望将它们与命名互斥锁同步,这样您就不会覆盖进程之间的数据。

