C++ 使用 std::fstream 读取文本文件时如何使用非默认分隔符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10376199/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 13:59:27  来源:igfitidea点击:

How can I use non-default delimiters when reading a text file with std::fstream?

c++fstreamifstream

提问by FrozenLand

In my C++ code, I want to read from a text file (*.txt) and tokenize every entry. More specifically, I want to be able to read individual words from a file, such as "format", "stack", "Jason", "europe", etc.

在我的 C++ 代码中,我想从文本文件 (*.txt) 中读取并标记每个条目。更具体地说,我希望能够从文件中读取单个单词,例如 "format"、"stack"、"Jason"、"europe"

I chose to use fstreamto perform this task, and I do not know how to set it's delimiter to the ones I want to use (space, \n, as well as hyphens and even apostrophes as in "Mcdonal's"). I figured space and \nare the default delimiters, but hyphens are not, but I want to treat them as delimiters so that when parsing the file, I will get words in "blah blah xxx animal--cat" as simply "blah", "blah", "xxx", "animal", "cat".

我选择使用fstream来执行此任务,但我不知道如何将它的分隔符设置为我想要使用的分隔符(空格、\n、以及连字符甚至撇号,如“麦当劳”中的那样)。我认为空格\n是默认的分隔符,但连字符不是,但我想将它们视为分隔符,以便在解析文件时,我将在“blah blah xxx animal--cat”中得到单词“blah”,“等等”、“xxx”、“动物”、“猫”。

That is, I want to be able to get two strings from "stack-overflow", "you're", etc,and still be able to maintain \nand space as delimiters at the same time.

也就是说,我希望能够从“stack-overflow”、“you're”等中获取两个字符串并且仍然能够同时保持\n和空格作为分隔符。

回答by Jerry Coffin

An istream treats "white space" as delimiters. It uses a locale to tell it what characters are white space. A locale, in turn, includes a ctype facetthat classifies character types. Such a facet could look something like this:

istream 将“空白”视为分隔符。它使用语言环境来告诉它哪些字符是空格。反过来,语言环境包括对facet字符类型进行分类的 ctype 。这样的方面可能如下所示:

#include <locale>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <vector>
#include <sstream>

class my_ctype : public
std::ctype<char>
{
    mask my_table[table_size];
public:
    my_ctype(size_t refs = 0)  
        : std::ctype<char>(&my_table[0], false, refs)
    {
        std::copy_n(classic_table(), table_size, my_table);
        my_table['-'] = (mask)space;
        my_table['\''] = (mask)space;
    }
};

And a little test program to show it works:

还有一个小测试程序来显示它的工作原理:

int main() {
    std::istringstream input("This is some input from McDonald's and Burger-King.");
    std::locale x(std::locale::classic(), new my_ctype);
    input.imbue(x);

    std::copy(std::istream_iterator<std::string>(input),
        std::istream_iterator<std::string>(),
        std::ostream_iterator<std::string>(std::cout, "\n"));

    return 0;
}

Result:

结果:

This
is
some
input
from
McDonald
s
and
Burger
King.

istream_iterator<string>uses >>to read the individual strings from the stream, so if you use them directly, you should get the same results. The parts you need to include are creating the locale and using imbueto make the stream use that locale.

istream_iterator<string>用于>>从流中读取单个字符串,因此如果直接使用它们,您应该得到相同的结果。您需要包含的部分是创建语言环境并imbue用于使流使用该语言环境。

回答by QuantumRipple

You can use

您可以使用

istream::getline(char* buffer, steamsize maxchars, char delim)

although this only supports a single delimiter. To further split the lines on your different delimiters, you could use

虽然这仅支持单个分隔符。要进一步拆分不同分隔符上的行,您可以使用

char* strtok(char* inString, const char* delims)  

which takes multiple delimeters. When you use strtok you only need to pass it the address of your buffer the first time - after that just pass in a null and it will give you the next token from the last one it gave you, returning a null pointer when there are no more.

这需要多个分隔符。当您使用 strtok 时,您只需要第一次将缓冲区的地址传递给它 - 之后只需传入一个空值,它将为您提供它给您的最后一个令牌的下一个令牌,当没有时返回一个空指针更多的。

EDIT: A specific implementation would be something like

编辑:一个特定的实现将是这样的

char buffer[120]; //this size is dependent on what you expect the file to contain
while (!myIstream.eofbit) //I may have forgotten the exact syntax of the end bit
{
    myIstream.getline(buffer, 120); //using default delimiter of \n
    char* tokBuffer;
    tokBuffer = strtok(buffer, "'- ");
    while (tokBuffer != null) {
        cout << "token is: " << tokBuffer << "\n";
        tokBuffer = strtok(null, "'- "); //I don't need to pass in the buffer again because it remembers the first time I called it
    }
}