如何在 C++ 中将整个文件读入 std::string？

Question

提问by

How do I read a file into a std::string, i.e., read the whole file at once?

如何将文件读入std::string，即一次读取整个文件？

Text or binary mode should be specified by the caller. The solution should be standard-compliant, portable and efficient. It should not needlessly copy the string's data, and it should avoid reallocations of memory while reading the string.

文本或二进制模式应由调用者指定。该解决方案应符合标准、便携且高效。它不应该不必要地复制字符串的数据，并且应该避免在读取字符串时重新分配内存。

One way to do this would be to stat the filesize, resize the std::stringand fread()into the std::string's const_cast<char*>()'ed data(). This requires the std::string's data to be contiguous which is not required by the standard, but it appears to be the case for all known implementations. What is worse, if the file is read in text mode, the std::string's size may not equal the file's size.

这样做将是STAT的文件大小，单向调整std::string和fread()进入std::string的const_cast<char*>()'编辑data()。这要求std::string的数据是连续的，这不是标准所要求的，但对于所有已知的实现来说似乎都是这种情况。更糟糕的是，如果以文本模式读取文件，std::string的大小可能不等于文件的大小。

A fully correct, standard-compliant and portable solutions could be constructed using std::ifstream's rdbuf()into a std::ostringstreamand from there into a std::string. However, this could copy the string data and/or needlessly reallocate memory.

可以使用std::ifstream's rdbuf()into astd::ostringstream和从那里到std::string. 但是，这可能会复制字符串数据和/或不必要地重新分配内存。

Are all relevant standard library implementations smart enough to avoid all unnecessary overhead?
Is there another way to do it?
Did I miss some hidden Boost function that already provides the desired functionality?

所有相关的标准库实现是否足够智能以避免所有不必要的开销？
还有另一种方法吗？
我是否错过了一些已经提供所需功能的隐藏 Boost 函数？

void slurp(std::string& data, bool is_binary)

Answer 1

回答by Konrad Rudolph

One way is to flush the stream buffer into a separate memory stream, and then convert that to std::string:

一种方法是将流缓冲区刷新到单独的内存流中，然后将其转换为std::string：

std::string slurp(std::ifstream& in) {
    std::ostringstream sstr;
    sstr << in.rdbuf();
    return sstr.str();
}

This is nicely concise. However, as noted in the question this performs a redundant copy and unfortunately there is fundamentally no way of eliding this copy.

这很简洁。然而，正如问题中所指出的，这会执行一个冗余副本，不幸的是，基本上没有办法消除这个副本。

The only real solution that avoids redundant copies is to do the reading manually in a loop, unfortunately. Since C++ now has guaranteed contiguous strings, one could write the following (≥C++14):

不幸的是，避免冗余副本的唯一真正解决方案是手动循环读取。由于 C++ 现在保证了连续的字符串，因此可以编写以下内容 (≥C++14)：

auto read_file(std::string_view path) -> std::string {
    constexpr auto read_size = std::size_t{4096};
    auto stream = std::ifstream{path.data()};
    stream.exceptions(std::ios_base::badbit);

    auto out = std::string{};
    auto buf = std::string(read_size, 'string readFile2(const string &fileName)
{
    ifstream ifs(fileName.c_str(), ios::in | ios::binary | ios::ate);

    ifstream::pos_type fileSize = ifs.tellg();
    ifs.seekg(0, ios::beg);

    vector<char> bytes(fileSize);
    ifs.read(bytes.data(), fileSize);

    return string(bytes.data(), fileSize);
}
');
    while (stream.read(& buf[0], read_size)) {
        out.append(buf, 0, stream.gcount());
    }
    out.append(buf, 0, stream.gcount());
    return out;
}

Answer 2

回答by paxos1977

See this answeron a similar question.

请参阅有关类似问题的此答案。

For your convenience, I'm reposting CTT's solution:

为了您的方便，我重新发布了 CTT 的解决方案：

std::string str(std::istreambuf_iterator<char>{ifs}, {});

This solution resulted in about 20% faster execution times than the other answers presented here, when taking the average of 100 runs against the text of Moby Dick (1.3M). Not bad for a portable C++ solution, I would like to see the results of mmap'ing the file ;)

当对 Moby Dick (1.3M) 的文本进行 100 次运行的平均值时，此解决方案的执行时间比此处提供的其他答案快 20%。对于便携式 C++ 解决方案来说还不错，我想看看对文件进行 mmap 的结果；)

Answer 3

回答by Konrad Rudolph

The shortest variant: Live On Coliru

最短的变体： Live On Coliru

#include <iostream>
#include <sstream>
#include <fstream>

int main()
{
  std::ifstream input("file.txt");
  std::stringstream sstr;

  while(input >> sstr.rdbuf());

  std::cout << sstr.str() << std::endl;
}

It requires the header <iterator>.

它需要 header <iterator>。

There were some reports that this method is slower than preallocating the string and using std::istream::read. However, on a modern compiler with optimisations enabled this no longer seems to be the case, though the relative performance of various methods seems to be highly compiler dependent.

有一些报道称这种方法比预分配字符串和使用std::istream::read. 然而，在启用了优化的现代编译器上，情况似乎不再如此，尽管各种方法的相对性能似乎高度依赖于编译器。

Answer 4

回答by Ben Collins

Use

用

...
std::streamsize size = file.tellg();
std::vector<char> buffer(size);
...

or something very close. I don't have a stdlib reference open to double-check myself.

或非常接近的东西。我没有打开 stdlib 引用来仔细检查自己。

Yes, I understand I didn't write the slurpfunction as asked.

是的，我知道我没有slurp按照要求编写函数。

Answer 5

回答by Rick Ramstetter

I do not have enough reputation to comment directly on responses using tellg().

我没有足够的声誉来直接评论使用tellg().

Please be aware that tellg()can return -1 on error. If you're passing the result of tellg()as an allocation parameter, you should sanity check the result first.

请注意，tellg()可能会在出错时返回 -1。如果您将的结果tellg()作为分配参数传递，您应该首先检查结果。

An example of the problem:

问题的一个例子：

string readFile2(const string &fileName)
{
    ifstream ifs(fileName.c_str(), ios::in | ios::binary | ios::ate);

    ifstream::pos_type fileSize = ifs.tellg();
    if (fileSize < 0)                             <--- ADDED
        return std::string();                     <--- ADDED

    ifs.seekg(0, ios::beg);

    vector<char> bytes(fileSize);
    ifs.read(&bytes[0], fileSize);

    return string(&bytes[0], fileSize);
}

In the above example, if tellg()encounters an error it will return -1. Implicit casting between signed (ie the result of tellg()) and unsigned (ie the arg to the vector<char>constructor) will result in a your vector erroneously allocating a verylarge number of bytes. (Probably 4294967295 bytes, or 4GB.)

在上面的例子中，如果tellg()遇到错误，它将返回 -1。签署（IE的结果之间的隐式转换tellg()）和无符号（即ARG的vector<char>构造函数）将导致您的载体错误分配一个非常大的字节数。（可能是 4294967295 字节，或 4GB。）

Modifying paxos1977's answer to account for the above:

修改 paxos1977 的答案以解决上述问题：

#include <filesystem>
#include <fstream>
#include <string>

namespace fs = std::filesystem;

std::string readFile(fs::path path)
{
    // Open the stream to 'lock' the file.
    std::ifstream f(path, std::ios::in | std::ios::binary);

    // Obtain the size of the file.
    const auto sz = fs::file_size(path);

    // Create a buffer.
    std::string result(sz, 'std::string file_to_string(const std::string& file_name)
{
    std::ifstream file_stream{file_name};

    if (file_stream.fail())
    {
        // Error opening file.
    }

    std::ostringstream str_stream{};
    file_stream >> str_stream.rdbuf();  // NOT str_stream << file_stream.rdbuf()

    if (file_stream.fail() && !file_stream.eof())
    {
        // Error reading file.
    }

    return str_stream.str();
}
');

    // Read the whole file into the buffer.
    f.read(result.data(), sz);

    return result;
}

Answer 6

回答by Gabriel Majeri

If you have C++17 (std::filesystem), there is also this way (which gets the file's size through std::filesystem::file_sizeinstead of seekgand tellg):

如果你有 C++17 (std::filesystem)，还有这种方式（它通过std::filesystem::file_size而不是seekgand获取文件的大小tellg）：

void slurp(std::string& data, const std::string& filename, bool is_binary)
{
    std::ios_base::openmode openmode = ios::ate | ios::in;
    if (is_binary)
        openmode |= ios::binary;
    ifstream file(filename.c_str(), openmode);
    data.clear();
    data.reserve(file.tellg());
    file.seekg(0, ios::beg);
    data.append(istreambuf_iterator<char>(file.rdbuf()), 
                istreambuf_iterator<char>());
}

Note: you may need to use <experimental/filesystem>and std::experimental::filesystemif your standard library doesn't yet fully support C++17. You might also need to replace result.data()with &result[0]if it doesn't support non-const std::basic_string data.

注意：您可能需要使用<experimental/filesystem>和std::experimental::filesystem，如果你的标准库并没有完全支持C ++ 17。如果它不支持非常量 std::basic_string data ，您可能还需要替换result.data()为。&result[0]

Answer 7

回答by tgnottingham

This solution adds error checking to the rdbuf()-based method.

此解决方案向基于 rdbuf() 的方法添加了错误检查。

#include <cstdint>
#include <exception>
#include <filesystem>
#include <fstream>
#include <sstream>
#include <string>

namespace fs = std::filesystem;

std::string loadFile(const char *const name);
std::string loadFile(const std::string &name);

std::string loadFile(const char *const name) {
  fs::path filepath(fs::absolute(fs::path(name)));

  std::uintmax_t fsize;

  if (fs::exists(filepath)) {
    fsize = fs::file_size(filepath);
  } else {
    throw(std::invalid_argument("File not found: " + filepath.string()));
  }

  std::ifstream infile;
  infile.exceptions(std::ifstream::failbit | std::ifstream::badbit);
  try {
    infile.open(filepath.c_str(), std::ios::in | std::ifstream::binary);
  } catch (...) {
    std::throw_with_nested(std::runtime_error("Can't open input file " + filepath.string()));
  }

  std::string fileStr;

  try {
    fileStr.resize(fsize);
  } catch (...) {
    std::stringstream err;
    err << "Can't resize to " << fsize << " bytes";
    std::throw_with_nested(std::runtime_error(err.str()));
  }

  infile.read(fileStr.data(), fsize);
  infile.close();

  return fileStr;
}

std::string loadFile(const std::string &name) { return loadFile(name.c_str()); };

I'm adding this answer because adding error-checking to the original method is not as trivial as you'd expect. The original method uses stringstream's insertion operator (str_stream << file_stream.rdbuf()). The problem is that this sets the stringstream's failbit when no characters are inserted. That can be due to an error or it can be due to the file being empty. If you check for failures by inspecting the failbit, you'll encounter a false positive when you read an empty file. How do you disambiguate legitimate failure to insert any characters and "failure" to insert any characters because the file is empty?

我添加这个答案是因为向原始方法添加错误检查并不像您期望的那么简单。原始方法使用 stringstream 的插入运算符 ( str_stream << file_stream.rdbuf())。问题在于，当没有插入字符时，这会设置字符串流的故障位。这可能是由于错误，也可能是由于文件为空。如果通过检查故障位来检查故障，则在读取空文件时会遇到误报。您如何消除由于文件为空而导致插入任何字符的合法失败和“失败”插入任何字符的歧义？

You might think to explicitly check for an empty file, but that's more code and associated error checking.

您可能会考虑明确检查空文件，但这是更多的代码和相关的错误检查。

Checking for the failure condition str_stream.fail() && !str_stream.eof()doesn't work, because the insertion operation doesn't set the eofbit (on the ostringstream nor the ifstream).

检查失败条件str_stream.fail() && !str_stream.eof()不起作用，因为插入操作没有设置 eofbit（在 ostringstream 和 ifstream 上）。

So, the solution is to change the operation. Instead of using ostringstream's insertion operator (<<), use ifstream's extraction operator (>>), which does set the eofbit. Then check for the failiure condition file_stream.fail() && !file_stream.eof().

所以，解决办法是改变操作。不要使用 ostringstream 的插入运算符 (<<)，而是使用 ifstream 的提取运算符 (>>)，它确实设置了 eofbit。然后检查故障情况file_stream.fail() && !file_stream.eof()。

Importantly, when file_stream >> str_stream.rdbuf()encounters a legitimate failure, it shouldn't ever set eofbit (according to my understanding of the specification). That means the above check is sufficient to detect legitimate failures.

重要的是，当file_stream >> str_stream.rdbuf()遇到合法的故障时，它不应该设置 eofbit（根据我对规范的理解）。这意味着上述检查足以检测合法故障。

Answer 8

回答by Matt Price

Something like this shouldn't be too bad:

像这样的事情不应该太糟糕：

std::string data;
std::ifstream in( "test.txt" );
std::getline( in, data, std::string::traits_type::to_char_type( 
                  std::string::traits_type::eof() ) );

The advantage here is that we do the reserve first so we won't have to grow the string as we read things in. The disadvantage is that we do it char by char. A smarter version could grab the whole read buf and then call underflow.

这里的优点是我们先进行保留，这样我们就不必在读取内容时增加字符串。缺点是我们逐个字符地进行。更智能的版本可以获取整个读取缓冲区，然后调用下溢。

Answer 9

回答by David G

Here's a version using the new filesystem library with reasonably robust error checking:

这是一个使用新文件系统库的版本，具有相当强大的错误检查功能：

##代码##

Answer 10

回答by Martin Cote

You can use the 'std::getline' function, and specify 'eof' as the delimiter. The resulting code is a little bit obscure though:

您可以使用 'std::getline' 函数，并指定 'eof' 作为分隔符。生成的代码虽然有点晦涩：

##代码##

如何在 C++ 中将整个文件读入 std::string？

提问by

回答by Konrad Rudolph

回答by paxos1977

回答by Konrad Rudolph

回答by Ben Collins

回答by Rick Ramstetter

回答by Gabriel Majeri

回答by tgnottingham

回答by Matt Price

回答by David G

回答by Martin Cote

相关推荐

最近更新

标签

如何在 C++ 中将整个文件读入 std::string？

提问by

回答by Konrad Rudolph

回答by paxos1977

回答by Konrad Rudolph

回答by Ben Collins

回答by Rick Ramstetter

回答by Gabriel Majeri

回答by tgnottingham

回答by Matt Price

回答by David G

回答by Martin Cote

相关推荐

C++ 通过引用传递给构造函数

C++ 从 std::fstream 获取 FILE*

C++ 数组大小

C++ 如何从 std::map 检索所有键（或值）并将它们放入向量中？

相关推荐

最近更新

标签