C++ 从 ifstream 读取不会读取空格

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6774825/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 20:39:45  来源:igfitidea点击:

Reading from ifstream won't read whitespace

c++c++11

提问by Puppy

I'm implementing a custom lexer in C++ and when attempting to read in whitespace, the ifstream won't read it out. I'm reading character by character using >>, and all the whitespace is gone. Is there any way to make the ifstream keep all the whitespace and read it out to me? I know that when reading whole strings, the read will stop at whitespace, but I was hoping that by reading character by character, I would avoid this behaviour.

我在 C++ 中实现了一个自定义词法分析器,当尝试读取空白时,ifstream 不会读出它。我正在使用 逐个字符地阅读>>,所有的空白都消失了。有什么办法可以让 ifstream 保留所有空白并读出给我听吗?我知道在读取整个字符串时,读取将在空格处停止,但我希望通过逐个字符读取,我会避免这种行为。

Attempted: .get(), recommended by many answers, but it has the same effect as std::noskipws, that is, I get all the spaces now, but notthe new-line character that I need to lex some constructs.

Attempted: .get(),由许多答案推荐,但它与 具有相同的效果std::noskipws,也就是说,我现在获得了所有空格,但不是我需要对某些结构进行词法分析的换行符。

Here's the offending code (extended comments truncated)

这是有问题的代码(扩展注释被截断)

while(input >> current) {
    always_next_struct val = always_next_struct(next);
    if (current == L' ' || current == L'\n' || current == L'\t' || current == L'\r') {
        continue;
    }
    if (current == L'/') {
        input >> current;
        if (current == L'/') {
            // explicitly empty while loop
            while(input.get(current) && current != L'\n');
            continue;
        }

I'm breaking on the whileline and looking at every value of currentas it comes in, and \ror \nare definitely not among them- the input just skips to the next line in the input file.

我正在中断while行并查看传入的每个值current\r或者\n绝对不在其中 - 输入只是跳到输入文件中的下一行。

采纳答案by Puppy

I ended up just cracking open the Windows API and using it to read the whole file into a buffer first, and then reading that buffer character by character. Thanks guys.

我最终只是打开 Windows API 并使用它首先将整个文件读入缓冲区,然后逐个字符读取该缓冲区。谢谢你们。

回答by R. Martinho Fernandes

There is a manipulator to disable the whitespace skipping behavior:

有一个操纵器可以禁用空格跳过行为:

stream >> std::noskipws;

回答by René Richter

The operator>> eats whitespace (space, tab, newline). Use yourstream.get()to read each character.

运算符>> 吃空格(空格、制表符、换行符)。使用yourstream.get()读取每个字符。

Edit:

编辑:

Beware: Platforms (Windows, Un*x, Mac) differ in coding of newline. It can be '\n', '\r' or both. It also depends on how you open the file stream (text or binary).

注意:平台(Windows、Un*x、Mac)在换行符编码上有所不同。它可以是 '\n'、'\r' 或两者。它还取决于您如何打开文件流(文本或二进制)。

Edit (analyzing code):

编辑(分析代码):

After

  while(input.get(current) && current != L'\n');
  continue;

there will be an \nin current, if not end of file is reached. After that you continue with the outmost while loop. There the first character on the next line is read into current. Is that not what you wanted?

如果没有到达文件末尾,将会有一个\nin current。之后,您继续最外层的 while 循环。下一行的第一个字符被读入current。这不是你想要的吗?

I tried to reproduce your problem (using charand cininstead of wchar_tand wifstream):

我试图重现您的问题(使用charandcin而不是wchar_tand wifstream):

//: get.cpp : compile, then run: get < get.cpp

#include <iostream>

int main()
{
  char c;

  while (std::cin.get(c))
  {
    if (c == '/') 
    { 
      char last = c; 
      if (std::cin.get(c) && c == '/')
      {
        // std::cout << "Read to EOL\n";
        while(std::cin.get(c) && c != '\n'); // this comment will be skipped
        // std::cout << "go to next line\n";
        std::cin.putback(c);
        continue;
      }
     else { std::cin.putback(c); c = last; }
    }
    std::cout << c;
  }
  return 0;
}

This program, applied to itself, eliminates all C++ line comments in its output. The inner while loop doesn't eat up all text to the end of file. Please note the putback(c)statement. Without that the newline would not appear.

该程序应用于自身,消除了其输出中的所有 C++ 行注释。内部 while 循环不会将所有文本都吃到文件末尾。请注意putback(c)声明。没有这个换行符就不会出现。

If it doesn't work the same for wifstream, it would be verystrange except for one reason: when the opened text file is not saved as 16bit charand the \nchar ends up in the wrong byte...

如果它对 不起作用wifstream,那会奇怪,除了一个原因:当打开的文本文件没有保存为 16 位字符并且\n字符以错误的字节结束时......

回答by jalf

Wrap the stream (or its buffer, specifically) in a std::streambuf_iterator? That should ignore all formatting, and also give you a nice iterator interface.

将流(或它的缓冲区,特别是)包装在std::streambuf_iterator? 这应该忽略所有格式,并为您提供一个不错的迭代器界面。

Alternatively, a much more efficient, and fool-proof, approach might to just use the Win32 API (or Boost) to memory-map the file. Then you can traverse it using plain pointers, and you're guaranteed that nothing will be skipped or converted by the runtime.

或者,一种更高效且万无一失的方法可能仅使用 Win32 API(或 Boost)来对文件进行内存映射。然后您可以使用普通指针遍历它,并且您可以保证运行时不会跳过或转换任何内容。

回答by Pete

You could open the stream in binary mode:

您可以以二进制模式打开流:

std::wifstream stream(filename, std::ios::binary);

You'll lose any formatting operations provided my the stream if you do this.

如果您这样做,您将丢失任何提供我的流的格式化操作。

The other option is to read the entire stream into a string and then process the string:

另一种选择是将整个流读入一个字符串,然后处理该字符串:

std::wostringstream ss;
ss << filestream.rdbuf();

OF course, getting the string from the ostringstream rquires an additional copy of the string, so you could consider changing this at some point to use a custom stream if you feel adventurous. EDIT: someone else mention istreambuf_iterator, which is probably a better way of doing it than reading the whole stream into a string.

当然,从 ostringstream 获取字符串需要额外的字符串副本,因此如果您喜欢冒险,可以考虑在某个时候更改它以使用自定义流。编辑:其他人提到 istreambuf_iterator,这可能是比将整个流读入字符串更好的方法。

回答by Bo Persson

The stream extractors behave the same and skip whitespace.

流提取器的行为相同并跳过空格。

If you want to read every byte, you can use the unformatted input functions, like stream.get(c).

如果要读取每个字节,可以使用未格式化的输入函数,例如stream.get(c).

回答by Matthieu M.

Why not simply use getline?

为什么不简单地使用getline

You will get all the whitespaces, and while you won't get the end of lines characters, you will still know where they lie :)

你会得到所有的空格,虽然你不会得到行尾字符,但你仍然会知道它们在哪里:)

回答by HaSeeB MiR

You could just Wrap the stream in a std::streambuf_iteratorto get data with all whitespaces and newlines like this .

您可以将流包装在std::streambuf_iterator 中,以获取包含所有空格和换行符的数据,如下所示。

           /*Open the stream in default mode.*/
            std::ifstream myfile("myfile.txt");

            if(myfile.good()) {
                /*Read data using streambuffer iterators.*/
    vector<char> buf((std::istreambuf_iterator<char>(myfile)), (std::istreambuf_iterator<char>()));

                /*str_buf holds all the data including whitespaces and newline .*/
                string str_buf(buf.begin(),buf.end());

                myfile.close();
            } 

回答by shawon

Just Use getline.

只需使用getline。

while (getline(input,current))
{
      cout<<current<<"\n";

}

回答by Ben B

By default, this skipws flag is already seton the ifstream object, so we must disable it. The ifstream object has these default flags because of std::basic_ios::init, called on every new ios_base object (more details). Anyof the following would work:

默认情况下,此skipws 标志已在 ifstream 对象上设置,因此我们必须禁用它。ifstream 对象由于 std::basic_ios::init 具有这些默认标志,在每个新的 ios_base 对象上调用(更多详细信息)。以下任何一项都可以:

in_stream.unsetf(std::ios_base::skipws);
in_stream >> std::noskipws; // Using the extraction operator, same as below
std::noskipws(in_stream); // Explicitly calling noskipws instead of using operator>>

Other flags are listed on cpp reference.

cpp 参考中列出了其他标志。