Java NIO - 内存映射文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22153377/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java NIO - Memory mapped files
提问by Bober02
I recently came across this articlewhich provided a nice intro to memory mapped files and how it can be shared between two processes. Here is the code for a process that reads in the file:
我最近看到了这篇文章,它很好地介绍了内存映射文件以及如何在两个进程之间共享它。这是读取文件的进程的代码:
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
public class MemoryMapReader {
/**
* @param args
* @throws IOException
* @throws FileNotFoundException
* @throws InterruptedException
*/
public static void main(String[] args) throws FileNotFoundException, IOException, InterruptedException {
FileChannel fc = new RandomAccessFile(new File("c:/tmp/mapped.txt"), "rw").getChannel();
long bufferSize=8*1000;
MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, bufferSize);
long oldSize=fc.size();
long currentPos = 0;
long xx=currentPos;
long startTime = System.currentTimeMillis();
long lastValue=-1;
for(;;)
{
while(mem.hasRemaining())
{
lastValue=mem.getLong();
currentPos +=8;
}
if(currentPos < oldSize)
{
xx = xx + mem.position();
mem = fc.map(FileChannel.MapMode.READ_ONLY,xx, bufferSize);
continue;
}
else
{
long end = System.currentTimeMillis();
long tot = end-startTime;
System.out.println(String.format("Last Value Read %s , Time(ms) %s ",lastValue, tot));
System.out.println("Waiting for message");
while(true)
{
long newSize=fc.size();
if(newSize>oldSize)
{
oldSize = newSize;
xx = xx + mem.position();
mem = fc.map(FileChannel.MapMode.READ_ONLY,xx , oldSize-xx);
System.out.println("Got some data");
break;
}
}
}
}
}
}
I have, however, a few comments/questions regarding that approach:
但是,我对这种方法有一些评论/问题:
If we execute the reader only on an empty file, i.e run
如果我们只在一个空文件上执行阅读器,即运行
long bufferSize=8*1000;
MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, bufferSize);
long oldSize=fc.size();
This will allocate 8000 bytes which will now extend the file. The buffer that this returns has a limit of 8000 and a position of 0, therefore, the reader can proceed and read empty data. After this happens, the reader will stop, as currentPos == oldSize.
这将分配 8000 字节,现在将扩展文件。此返回的缓冲区的限制为 8000,位置为 0,因此,读取器可以继续读取空数据。发生这种情况后,阅读器将停止,如currentPos == oldSize。
Supposedly now the writer comes in (code is omitted as most of it is straightforward and can be referenced from the website) - it uses the same buffer size, so it will write first 8000 bytes, then allocate another 8000, extending the file. Now, if we suppose this process pauses at this point, and we go back to the reader, then the reader sees the new size of the file and allocates the remainder (so from position 8000 until 1600) and starts reading again, reading in another garbage...
据说现在作者进来了(代码被省略,因为它的大部分都很简单,可以从网站上引用) - 它使用相同的缓冲区大小,所以它会先写入 8000 个字节,然后再分配 8000 个字节,扩展文件。现在,如果我们假设这个过程在这一点暂停,我们回到阅读器,那么阅读器会看到文件的新大小并分配剩余部分(从位置 8000 到 1600)并再次开始阅读,读取另一个垃圾...
I am a bit confused whether there is a why to synchronize those two operations. As far as I see it, any call to mapmight extend the file with really an empty buffer (filled with zeros) or the writer might have just extended the file, but has not written anything into it yet...
我有点困惑是否有同步这两个操作的原因。就我而言,任何调用都map可能使用真正的空缓冲区(填充零)扩展文件,或者作者可能刚刚扩展了文件,但尚未向其中写入任何内容......
采纳答案by Holger
There are several ways.
有几种方法。
Let the writer acquire an exclusive
Lockon the region that has not been written yet. Release the lock when everything has been written. This is compatible to every other application running on that system but it requires the reader to be smart enough to retry on failed reads unless you combine it with one of the other methodsUse another communication channel, e.g. a pipe or a socket or a file's metadata channel to let the writer tell the reader about the finished write.
Write at a position in the file a special marker (being part of the protocol) telling about the written data, e.g.
MappedByteBuffer bb; … // write your data bb.force();// ensure completion of all writes bb.put(specialPosition, specialMarkerValue); bb.force();// ensure visibility of the marker
让写手
Lock在尚未写完的区域上获得独占权。写完所有内容后释放锁。这与该系统上运行的所有其他应用程序兼容,但它要求阅读器足够聪明以重试失败的读取,除非您将其与其他方法之一结合使用使用另一个通信通道,例如管道或套接字或文件的元数据通道,让作者告诉读者已完成的写入。
在文件中的某个位置写入一个特殊标记(作为协议的一部分),告知写入的数据,例如
MappedByteBuffer bb; … // write your data bb.force();// ensure completion of all writes bb.put(specialPosition, specialMarkerValue); bb.force();// ensure visibility of the marker
回答by Tim Cooper
I do a lot of work with memory-mapped files for interprocess communication. I would notrecommend Holger's #1 or #2, but his #3 is what I do. But a key point is perhaps that I only ever work with a single writer - things get more complicated if you have multiple writers.
我使用内存映射文件进行了大量工作以进行进程间通信。我不会推荐 Holger 的 #1 或 #2,但他的 #3 是我所做的。但关键的一点可能是我只和一个作家一起工作——如果你有多个作家,事情会变得更加复杂。
The start of the file is a header section with whatever header variables you need, most importantly a pointer to the end of the written data. The writer should always update this header variable afterwriting a piece of data, and the reader should never read beyond this variable. A thing called "cache coherency" that all mainstream CPU's have will guarantee that the reader will see memory writes in the same sequence they are written, so the reader will never read uninitialised memory if you follow these rules. (An exception is where the reader and writers are on different servers - cache coherency doesn't work there. Don't try to implement shared memory across different servers!)
文件的开头是一个标题部分,其中包含您需要的任何标题变量,最重要的是指向写入数据末尾的指针。写入者在写入一条数据后应始终更新此标头变量,而读取者不应读取超出此变量的内容。所有主流 CPU 都拥有一种称为“缓存一致性”的东西,它将保证读者将看到内存写入的顺序与它们写入的顺序相同,因此如果您遵循这些规则,读者将永远不会读取未初始化的内存。(一个例外是读取器和写入器位于不同的服务器上 - 缓存一致性在那里不起作用。不要尝试在不同的服务器之间实现共享内存!)
There is no limit to how frequently you can update the end-of-file pointer - it's all in memory and there won't be any i/o involved, so you can update it each record or each message you write.
更新文件结束指针的频率没有限制 - 它全部在内存中,不会涉及任何 I/O,因此您可以更新每条记录或您编写的每条消息。
ByteBuffer has versions of 'getInt()' and 'putInt()' methods which take an absolute byte offset, so that's what I use for reading & writing the end-of-file marker...I never use the relative versions when working with memory-mapped files.
ByteBuffer 有 'getInt()' 和 'putInt()' 方法的版本,它们采用绝对字节偏移量,所以这就是我用来读取和写入文件结束标记的方法......我在工作时从不使用相对版本与内存映射文件。
There's no way you should use the file size or yet another interprocess method to communicate the end-of-file marker and no need or benefit when you already have shared memory.
当您已经拥有共享内存时,您无法使用文件大小或另一种进程间方法来传达文件结束标记,也没有必要或受益。
回答by MikaelJ
Check out my library Mappedbus (http://github.com/caplogic/mappedbus) which enables multiple Java processes (JVMs) to write records in order to the same memory mapped file.
查看我的库 Mappedbus ( http://github.com/caplogic/mappedbus),它使多个 Java 进程 (JVM) 能够将记录写入同一个内存映射文件。
Here's how Mappedbus solves the synchronization problem between multiple writers:
以下是 Mappedbus 如何解决多个写入器之间的同步问题:
The first eight bytes of the file make up a field called the limit. This field specifies how much data has actually been written to the file. The readers will poll the limit field (using volatile) to see whether there's a new record to be read.
When a writer wants to add a record to the file it will use the fetch-and-add instruction to atomically update the limit field.
When the limit field has increased a reader will know there's new data to be read, but the writer which updated the limit field might not yet have written any data in the record. To avoid this problem each record contains an initial byte which make up the commit field.
When a writer has finished writing a record it will set the commit field (using volatile) and the reader will only start reading a record once it has seen that the commit field has been set.
文件的前八个字节组成了一个称为限制的字段。此字段指定实际写入文件的数据量。读取器将轮询限制字段(使用 volatile)以查看是否有要读取的新记录。
当作者想要向文件添加记录时,它将使用 fetch-and-add 指令自动更新限制字段。
当限制字段增加时,读取器将知道有新数据要读取,但更新限制字段的写入器可能尚未在记录中写入任何数据。为了避免这个问题,每条记录都包含一个组成提交字段的初始字节。
当写入器完成写入记录时,它将设置提交字段(使用 volatile),而读取器只会在看到提交字段已设置后才开始读取记录。
(BTW, the solution has only been verified to work on Linux x86 with Oracle's JVM. It most likely won't work on all platforms).
(顺便说一句,该解决方案仅经验证可在带有 Oracle JVM 的 Linux x86 上运行。它很可能无法在所有平台上运行)。

