C++ 通过多个分隔符将字符串拆分为单词

Question

提问by Sergei G

I have some text (meaningful text or arithmetical expression) and I want to split it into words.
If I had a single delimiter, I'd use:

我有一些文本（有意义的文本或算术表达式），我想将其拆分为单词。
如果我有一个分隔符，我会使用：

std::stringstream stringStream(inputString);
std::string word;
while(std::getline(stringStream, word, delimiter)) 
{
    wordVector.push_back(word);
}

How can I break the string into tokens with several delimiters?

如何将字符串分解为带有多个分隔符的标记？

Answer 1

回答by SoapBox

Assuming one of the delimiters is newline, the following reads the line and further splits it by the delimiters. For this example I've chosen the delimiters space, apostrophe, and semi-colon.

假设分隔符之一是换行符，以下读取该行并通过分隔符进一步拆分它。对于这个例子，我选择了分隔符空格、撇号和分号。

std::stringstream stringStream(inputString);
std::string line;
while(std::getline(stringStream, line)) 
{
    std::size_t prev = 0, pos;
    while ((pos = line.find_first_of(" ';", prev)) != std::string::npos)
    {
        if (pos > prev)
            wordVector.push_back(line.substr(prev, pos-prev));
        prev = pos+1;
    }
    if (prev < line.length())
        wordVector.push_back(line.substr(prev, std::string::npos));
}

Answer 2

回答by Matthew Smith

If you have boost, you could use:

如果你有提升，你可以使用：

#include <boost/algorithm/string.hpp>
std::string inputString("One!Two,Three:Four");
std::string delimiters("|,:");
std::vector<std::string> parts;
boost::split(parts, inputString, boost::is_any_of(delimiters));

Answer 3

回答by forumulator

I don't know why nobody pointed out the manual way, but here it is:

我不知道为什么没有人指出手动方式，但这里是：

const std::string delims(";,:. \n\t");
inline bool isDelim(char c) {
    for (int i = 0; i < delims.size(); ++i)
        if (delims[i] == c)
            return true;
    return false;
}

and in function:

并在功能上：

std::stringstream stringStream(inputString);
std::string word; char c;

while (stringStream) {
    word.clear();

    // Read word
    while (!isDelim((c = stringStream.get()))) 
        word.push_back(c);
    if (c != EOF)
        stringStream.unget();

    wordVector.push_back(word);

    // Read delims
    while (isDelim((c = stringStream.get())));
    if (c != EOF)
        stringStream.unget();
}

This way you can do something useful with the delims if you want.

通过这种方式，您可以根据需要对 delims 做一些有用的事情。

Answer 4

回答by darune

Using `std::regex`

使用 `std::regex`

A std::regexcan do string splitting in a few lines:

Astd::regex可以在几行中进行字符串拆分：

std::regex re("[\|,:]");
std::sregex_token_iterator first{input.begin(), input.end(), re, -1}, last;//the '-1' is what makes the regex split (-1 := what was not matched)
std::vector<std::string> tokens{first, last};

Try it yourself

自己试试

Answer 5

回答by Porsche9II

Using Eric Niebler's range-v3 library:

使用 Eric Niebler 的 range-v3 库：

https://godbolt.org/z/ZnxfSa

#include <string>
#include <iostream>
#include "range/v3/all.hpp"

int main()
{
    std::string s = "user1:192.168.0.1|user2:192.168.0.2|user3:192.168.0.3";
    auto words = s  
        | ranges::view::split('|')
        | ranges::view::transform([](auto w){
            return w | ranges::view::split(':');
        });
      ranges::for_each(words, [](auto i){ std::cout << i  << "\n"; });
}

Answer 6

回答by Kohn1001

If you interesting in how to do it yourself and not using boost.

如果您对如何自己做而不是使用 boost 感兴趣。

Assuming the delimiter string may be very long - let say M, checking for every char in your string if it is a delimiter, would cost O(M) each, so doing so in a loop for all chars in your original string, let say in length N, is O(M*N).

假设分隔符字符串可能很长 - 假设 M，如果它是分隔符，则检查字符串中的每个字符，每个字符将花费 O(M)，因此在循环中对原始字符串中的所有字符执行此操作，假设长度为 N，为 O(M*N)。

I would use a dictionary (like a map - "delimiter" to "booleans" - but here I would use a simple boolean array that has true in index = ascii value for each delimiter).

我会使用字典（就像地图——“分隔符”到“布尔值”——但在这里我会使用一个简单的布尔数组，每个分隔符的 index = ascii 值都为真）。

Now iterating on the string and check if the char is a delimiter is O(1), which eventually gives us O(N) overall.

现在迭代字符串并检查字符是否是分隔符是 O(1)，最终给我们 O(N)。

Here is my sample code:

这是我的示例代码：

const int dictSize = 256;    

vector<string> tokenizeMyString(const string &s, const string &del)
{
    static bool dict[dictSize] = { false};

    vector<string> res;
    for (int i = 0; i < del.size(); ++i) {      
        dict[del[i]] = true;
    }

    string token("");
    for (auto &i : s) {
        if (dict[i]) {
            if (!token.empty()) {
                res.push_back(token);
                token.clear();
            }           
        }
        else {
            token += i;
        }
    }
    if (!token.empty()) {
        res.push_back(token);
    }
    return res;
}


int main()
{
    string delString = "MyDog:Odie, MyCat:Garfield  MyNumber:1001001";
//the delimiters are " " (space) and "," (comma) 
    vector<string> res = tokenizeMyString(delString, " ,");

    for (auto &i : res) {

        cout << "token: " << i << endl;
    }
return 0;
}

Note: tokenizeMyString returns vector by value and create it on the stack first, so we're using here the power of the compiler >>> RVO - return value optimization :)

注意：tokenizeMyString 按值返回向量并首先在堆栈上创建它，所以我们在这里使用编译器的强大功能 >>> RVO - 返回值优化 :)

C++ 通过多个分隔符将字符串拆分为单词

提问by Sergei G

回答by SoapBox

回答by Matthew Smith

回答by forumulator

回答by darune

Using `std::regex`

使用 `std::regex`

回答by Porsche9II

回答by Kohn1001

相关推荐

最近更新

标签

C++ 通过多个分隔符将字符串拆分为单词

提问by Sergei G

回答by SoapBox

回答by Matthew Smith

回答by forumulator

回答by darune

Using std::regex

使用 std::regex

回答by Porsche9II

回答by Kohn1001

相关推荐

C++ auto 关键字。为什么是魔法？

C++ 错误：需要声明

C++ 需要在C++中将txt文件转换为二进制文件

C++ strdup 或 _strdup？

相关推荐

最近更新

标签

Using `std::regex`

使用 `std::regex`