使用字符串分隔符（标准 C++）解析（拆分）C++ 中的字符串

Question

提问by TheCrazyProgrammer

I am parsing a string in C++ using the following:

我正在使用以下内容解析 C++ 中的字符串：

using namespace std;

string parsed,input="text to be parsed";
stringstream input_stringstream(input);

if (getline(input_stringstream,parsed,' '))
{
     // do some processing.
}

Parsing with a single char delimiter is fine. But what if I want to use a string as delimiter.

使用单个字符分隔符进行解析很好。但是如果我想使用字符串作为分隔符怎么办。

Example: I want to split:

示例：我想拆分：

scott>=tiger

with >=as delimiter so that I can get scott and tiger.

用>=作为分隔符，这样我就可以得到斯科特和老虎。

Answer 1

回答by Vincenzo Pii

You can use the std::string::find()function to find the position of your string delimiter, then use std::string::substr()to get a token.

您可以使用该std::string::find()函数查找字符串分隔符的位置，然后使用它std::string::substr()来获取令牌。

Example:

例子：

std::string s = "scott>=tiger";
std::string delimiter = ">=";
std::string token = s.substr(0, s.find(delimiter)); // token is "scott"

The find(const string& str, size_t pos = 0)function returns the position of the first occurrence of strin the string, or nposif the string is not found.
The substr(size_t pos = 0, size_t n = npos)function returns a substring of the object, starting at position posand of length npos.

该find(const string& str, size_t pos = 0)函数返回str字符串中第一次出现的位置，或者npos如果未找到该字符串。
该substr(size_t pos = 0, size_t n = npos)函数返回对象的子字符串，从位置pos和长度开始npos。

If you have multiple delimiters, after you have extracted one token, you can remove it (delimiter included) to proceed with subsequent extractions (if you want to preserve the original string, just use s = s.substr(pos + delimiter.length());):

如果您有多个分隔符，在提取一个标记后，您可以将其删除（包括分隔符）以进行后续提取（如果您想保留原始字符串，只需使用s = s.substr(pos + delimiter.length());）：

s.erase(0, s.find(delimiter) + delimiter.length());

This way you can easily loop to get each token.

通过这种方式，您可以轻松地循环获取每个令牌。

Complete Example

完整示例

std::string s = "scott>=tiger>=mushroom";
std::string delimiter = ">=";

size_t pos = 0;
std::string token;
while ((pos = s.find(delimiter)) != std::string::npos) {
    token = s.substr(0, pos);
    std::cout << token << std::endl;
    s.erase(0, pos + delimiter.length());
}
std::cout << s << std::endl;

Output:

输出：

scott
tiger
mushroom

Answer 2

回答by moswald

This method uses std::string::findwithout mutating the original string by remembering the beginning and end of the previous substring token.

此方法std::string::find通过记住前一个子字符串标记的开头和结尾来使用而不改变原始字符串。

#include <iostream>
#include <string>

int main()
{
    std::string s = "scott>=tiger";
    std::string delim = ">=";

    auto start = 0U;
    auto end = s.find(delim);
    while (end != std::string::npos)
    {
        std::cout << s.substr(start, end - start) << std::endl;
        start = end + delim.length();
        end = s.find(delim, start);
    }

    std::cout << s.substr(start, end);
}

Answer 3

回答by Sviatoslav

You can use next function to split string:

您可以使用 next 函数来拆分字符串：

vector<string> split(const string& str, const string& delim)
{
    vector<string> tokens;
    size_t prev = 0, pos = 0;
    do
    {
        pos = str.find(delim, prev);
        if (pos == string::npos) pos = str.length();
        string token = str.substr(prev, pos-prev);
        if (!token.empty()) tokens.push_back(token);
        prev = pos + delim.length();
    }
    while (pos < str.length() && prev < str.length());
    return tokens;
}

Answer 4

回答by Arafat Hasan

For string delimiter

对于字符串分隔符

Split string based on a string delimiter. Such as splitting string "adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih"based on string delimiter "-+", output will be {"adsf", "qwret", "nvfkbdsj", "orthdfjgh", "dfjrleih"}

基于字符串分隔符拆分字符串。如"adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih"根据字符串分隔符拆分字符串"-+"，输出将是{"adsf", "qwret", "nvfkbdsj", "orthdfjgh", "dfjrleih"}

#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

// for string delimiter
vector<string> split (string s, string delimiter) {
    size_t pos_start = 0, pos_end, delim_len = delimiter.length();
    string token;
    vector<string> res;

    while ((pos_end = s.find (delimiter, pos_start)) != string::npos) {
        token = s.substr (pos_start, pos_end - pos_start);
        pos_start = pos_end + delim_len;
        res.push_back (token);
    }

    res.push_back (s.substr (pos_start));
    return res;
}

int main() {
    string str = "adsf-+qwret-+nvfkbdsj-+orthdfjgh-+dfjrleih";
    string delimiter = "-+";
    vector<string> v = split (str, delimiter);

    for (auto i : v) cout << i << endl;

    return 0;
}

Output

输出

adsf
qwret
nvfkbdsj
orthdfjgh
dfjrleih

For single character delimiter

对于单字符分隔符

Split string based on a character delimiter. Such as splitting string "adsf+qwer+poui+fdgh"with delimiter "+"will output {"adsf", "qwer", "poui", "fdg"h}

基于字符分隔符拆分字符串。比如"adsf+qwer+poui+fdgh"用分隔符分割字符串"+"会输出{"adsf", "qwer", "poui", "fdg"h}

#include <iostream>
#include <sstream>
#include <vector>

using namespace std;

vector<string> split (const string &s, char delim) {
    vector<string> result;
    stringstream ss (s);
    string item;

    while (getline (ss, item, delim)) {
        result.push_back (item);
    }

    return result;
}

int main() {
    string str = "adsf+qwer+poui+fdgh";
    vector<string> v = split (str, '+');

    for (auto i : v) cout << i << endl;

    return 0;
}

Output

输出

adsf
qwer
poui
fdgh

Answer 5

回答by William Cuervo

This code splits lines from text, and add everyone into a vector.

此代码从文本中拆分行，并将每个人添加到一个向量中。

vector<string> split(char *phrase, string delimiter){
    vector<string> list;
    string s = string(phrase);
    size_t pos = 0;
    string token;
    while ((pos = s.find(delimiter)) != string::npos) {
        token = s.substr(0, pos);
        list.push_back(token);
        s.erase(0, pos + delimiter.length());
    }
    list.push_back(s);
    return list;
}

Called by:

调用者：

vector<string> listFilesMax = split(buffer, "\n");

Answer 6

回答by ryanbwork

strtokallows you to pass in multiple chars as delimiters. I bet if you passed in ">=" your example string would be split correctly (even though the > and = are counted as individual delimiters).

strtok允许您传入多个字符作为分隔符。我敢打赌，如果您传入 ">=" 您的示例字符串将被正确拆分（即使 > 和 = 被视为单独的分隔符）。

EDIT if you don't want to use c_str()to convert from string to char*, you can use substrand find_first_ofto tokenize.

编辑如果您不想使用c_str()从字符串转换为 char*，您可以使用substr和find_first_of来标记化。

string token, mystring("scott>=tiger");
while(token != mystring){
  token = mystring.substr(0,mystring.find_first_of(">="));
  mystring = mystring.substr(mystring.find_first_of(">=") + 1);
  printf("%s ",token.c_str());
}

Answer 7

回答by Beder Acosta Borges

Here's my take on this. It handles the edge cases and takes an optional parameter to remove empty entries from the results.

这是我对此的看法。它处理边缘情况并采用可选参数从结果中删除空条目。

bool endsWith(const std::string& s, const std::string& suffix)
{
    return s.size() >= suffix.size() &&
           s.substr(s.size() - suffix.size()) == suffix;
}

std::vector<std::string> split(const std::string& s, const std::string& delimiter, const bool& removeEmptyEntries = false)
{
    std::vector<std::string> tokens;

    for (size_t start = 0, end; start < s.length(); start = end + delimiter.length())
    {
         size_t position = s.find(delimiter, start);
         end = position != string::npos ? position : s.length();

         std::string token = s.substr(start, end - start);
         if (!removeEmptyEntries || !token.empty())
         {
             tokens.push_back(token);
         }
    }

    if (!removeEmptyEntries &&
        (s.empty() || endsWith(s, delimiter)))
    {
        tokens.push_back("");
    }

    return tokens;
}

Examples

例子

split("a-b-c", "-"); // [3]("a","b","c")

split("a--c", "-"); // [3]("a","","c")

split("-b-", "-"); // [3]("","b","")

split("--c--", "-"); // [5]("","","c","","")

split("--c--", "-", true); // [1]("c")

split("a", "-"); // [1]("a")

split("", "-"); // [1]("")

split("", "-", true); // [0]()

Answer 8

回答by hmofrad

This should work perfectly for string (or single character) delimiters. Don't forget to include #include <sstream>.

这对于字符串（或单个字符）分隔符应该非常有效。不要忘记包含#include <sstream>.

std::string input = "Alfa=,+Bravo=,+Charlie=,+Delta";
std::string delimiter = "=,+"; 
std::istringstream ss(input);
std::string token;
std::string::iterator it;

while(std::getline(ss, token, *(it = delimiter.begin()))) {
    while(*(++it)) ss.get();
    std::cout << token << " " << '\n';
}

The first while loop extracts a token using the first character of the string delimiter. The second while loop skips the rest of the delimiter and stops at the beginning of the next token.

第一个 while 循环使用字符串分隔符的第一个字符提取标记。第二个 while 循环跳过分隔符的其余部分并在下一个标记的开头停止。

Answer 9

回答by Shubham Agrawal

Answer is already there, but selected-answer uses erase function which is very costly, think of some very big string(in MBs). Therefore I use below function.

答案已经存在，但是 selected-answer 使用了非常昂贵的擦除功能，想想一些非常大的字符串（以 MB 为单位）。因此我使用以下功能。

vector<string> split(const string& i_str, const string& i_delim)
{
    vector<string> result;

    size_t found = i_str.find(i_delim);
    size_t startIndex = 0;

    while(found != string::npos)
    {
        string temp(i_str.begin()+startIndex, i_str.begin()+found);
        result.push_back(temp);
        startIndex = found + i_delim.size();
        found = i_str.find(i_delim, startIndex);
    }
    if(startIndex != i_str.size())
        result.push_back(string(i_str.begin()+startIndex, i_str.end()));
    return result;      
}

Answer 10

回答by Benjamin Lindley

I would use boost::tokenizer. Here's documentation explaining how to make an appropriate tokenizer function: http://www.boost.org/doc/libs/1_52_0/libs/tokenizer/tokenizerfunction.htm

我会使用boost::tokenizer. 以下文档解释了如何制作适当的标记器函数：http: //www.boost.org/doc/libs/1_52_0/libs/tokenizer/tokenizerfunction.htm

Here's one that works for your case.

这是一种适用于您的情况。

struct my_tokenizer_func
{
    template<typename It>
    bool operator()(It& next, It end, std::string & tok)
    {
        if (next == end)
            return false;
        char const * del = ">=";
        auto pos = std::search(next, end, del, del + 2);
        tok.assign(next, pos);
        next = pos;
        if (next != end)
            std::advance(next, 2);
        return true;
    }

    void reset() {}
};

int main()
{
    std::string to_be_parsed = "1) one>=2) two>=3) three>=4) four";
    for (auto i : boost::tokenizer<my_tokenizer_func>(to_be_parsed))
        std::cout << i << '\n';
}

使用字符串分隔符（标准 C++）解析（拆分）C++ 中的字符串

提问by TheCrazyProgrammer

回答by Vincenzo Pii

Complete Example

完整示例

回答by moswald

回答by Sviatoslav

回答by Arafat Hasan

For string delimiter

对于字符串分隔符

For single character delimiter

对于单字符分隔符

回答by William Cuervo

回答by ryanbwork

回答by Beder Acosta Borges

回答by hmofrad

回答by Shubham Agrawal

回答by Benjamin Lindley

相关推荐

最近更新

标签

使用字符串分隔符（标准 C++）解析（拆分）C++ 中的字符串

提问by TheCrazyProgrammer

回答by Vincenzo Pii

Complete Example

完整示例

回答by moswald

回答by Sviatoslav

回答by Arafat Hasan

For string delimiter

对于字符串分隔符

For single character delimiter

对于单字符分隔符

回答by William Cuervo

回答by ryanbwork

回答by Beder Acosta Borges

回答by hmofrad

回答by Shubham Agrawal

回答by Benjamin Lindley

相关推荐

C++ 如何传递指向构造函数的函数指针？

C++ 为地图赋值的最有效方法

C++ PE文件中的MZ签名有什么用？

两个日期之间的天数 C++

相关推荐

最近更新

标签