C++ istream vs 内存映射文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10839747/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 14:31:23  来源:igfitidea点击:

istream vs memory mapping a file?

c++

提问by user997112

I am trying to map a file to memory and then parse line by line- is istream what I should be using?

我正在尝试将文件映射到内存,然后逐行解析 - 我应该使用 istream 吗?

Is istream the same as mapping a file to memory on Windows? I have had difficulties trying to find a complete example of mapping a file into memory.

istream 与在 Windows 上将文件映射到内存相同吗?我在尝试找到将文件映射到内存的完整示例时遇到了困难。

I have seen people link memory mapping articles from MSDN, but if anybody could recommend a small (~15 line?) example I would be most thankful.

我见过有人从 MSDN 链接内存映射文章,但如果有人能推荐一个小的(~15 行?)示例,我会非常感激。

I must be searching for the wrong thing, but when searching "C++ memory mapping example" on Google, I could not find an example that included iterating through.

我一定是在搜索错误的东西,但是在 Google 上搜索“C++ 内存映射示例”时,我找不到包含迭代的示例。

These were the closest results (just so people realize I havelooked):

这些是最接近的结果(只是为了让人们意识到我已经看过了):

回答by ildjarn

std::istreamis an abstract type – you cannot use it directly. You should be deriving from it with a custom array-backed streambuf:

std::istream是一个抽象类型——你不能直接使用它。您应该使用自定义数组支持从它派生streambuf

#include <cstddef>
#include <string>
#include <streambuf>
#include <istream>

template<typename CharT, typename TraitsT = std::char_traits<CharT>>
struct basic_membuf : std::basic_streambuf<CharT, TraitsT> {
    basic_membuf(CharT const* const buf, std::size_t const size) {
        CharT* const p = const_cast<CharT*>(buf);
        this->setg(p, p, p + size);
    }

    //...
};

template<typename CharT, typename TraitsT = std::char_traits<CharT>>
struct basic_imemstream
: virtual basic_membuf<CharT, TraitsT>, std::basic_istream<CharT, TraitsT> {
    basic_imemstream(CharT const* const buf, std::size_t const size)
    : basic_membuf(buf, size),
      std::basic_istream(static_cast<std::basic_streambuf<CharT, TraitsT>*>(this))
    { }

    //...
};

using imemstream = basic_imemstream<char>;

char const* const mmaped_data = /*...*/;
std::size_t const mmap_size = /*...*/;
imemstream s(mmaped_data, mmap_size);
// s now uses the memory mapped data as its underlying buffer.

As for the memory-mapping itself, I recommend using Boost.Interprocessfor this purpose:

至于内存映射本身,我建议为此使用Boost.Interprocess

#include <cstddef>
#include <string>
#include <boost/interprocess/file_mapping.hpp>
#include <boost/interprocess/mapped_region.hpp>

namespace bip = boost::interprocess;

//...

std::string filename = /*...*/;
bip::file_mapping mapping(filename.c_str(), bip::read_only);
bip::mapped_region mapped_rgn(mapping, bip::read_only);
char const* const mmaped_data = static_cast<char*>(mapped_rgn.get_address());
std::size_t const mmap_size = mapped_rgn.get_size();


Code for imemstreamtaken from this answerby Dietmar Kühl.

代码imemstream取自此答案迪特马尔·库尔

回答by Emilio Garavaglia

Is istream the same as mapping a file to memory on windows?

istream 与将文件映射到 Windows 上的内存相同吗?

Not exactly. They are not the same in the same sense a "stream" is not a "file".

不完全是。它们在“流”不是“文件”的意义上是不同的。

Think to a file as a stored sequence, and to a stream as the interface for the "channel" (a stream_buffer) that sequence flows when moving from its store towards the receiving variables.

将文件视为存储序列,将流视为“通道”(stream_buffer)的接口,该序列在从存储移动到接收变量时流动。

Think to a memory mapped file as a "file" that -instead been stored outside the processing unit- is stored in-sync in memory. It has the advantage to be visible as a raw memory buffer being a file. If you want to read it as a stream, the simplest way is probably using a istringstream that has that raw buffer as the place to read from.

将内存映射文件视为一个“文件”,而不是存储在处理单元之外,而是同步存储在内存中。它具有作为文件的原始内存缓冲区可见的优点。如果您想将其作为流读取,最简单的方法可能是使用具有原始缓冲区作为读取位置的 istringstream。

回答by Wolfgang Brehm

Abstractly speaking, reading a file sequentially will not be sped up by using memory mapped files or by first reading it into memory. Memory mapped files make sense if reading the file sequentially is not feasible. Pre-caching the file like in the other answer or just by copying the file to a large string which you could then process by other means - again - only makes sense if reading the file once in sequence is not feasible and you have the RAM for it. This is because the slowest part of the operation is actually getting the data off the disk. And this has to be done regardless, whether you copy the file to RAM or you let the operating system map the data before you can access it or when you let std::iostream read it line by line and let it cache from the file just enough to make this work smoothly.

抽象地说,使用内存映射文件或先将文件读入内存不会加快顺序读取文件的速度。如果顺序读取文件不可行,则内存映射文件是有意义的。像在另一个答案中一样预先缓存文件,或者只是将文件复制到一个大字符串中,然后您可以通过其他方式进行处理 - 再次 - 只有在按顺序读取文件一次不可行并且您有 RAM 时才有意义它。这是因为操作中最慢的部分实际上是从磁盘中获取数据。无论是将文件复制到 RAM 还是让操作系统在访问数据之前映射数据,或者让 std::iostream 逐行读取并让它从文件中缓存时,都必须这样做足以使这项工作顺利进行。

In practice you could potentially eliminate some copying from ram to ram with the mapped or cached versions, by making shallow copies of the buffer ranges. Still this will not change much because this is RAM->RAM and therefore negligible in comparison to disk->RAM.

在实践中,通过制作缓冲区范围的浅拷贝,您可以潜在地消除一些从 ram 到 ram 的映射或缓存版本的复制。尽管如此,这不会有太大变化,因为这是 RAM->RAM,因此与磁盘->RAM 相比可以忽略不计。

The best advice in a situation like yours is therefore not to worry too much and just use std::iostream.

因此,在像您这样的情况下,最好的建议是不要太担心,只需使用 std::iostream。

[Ths answer is for archival purposes, because the correct answer is buried in the comments]

[这个答案是为了存档目的,因为正确的答案被埋在评论中]