使用 C++11 拆分字符串

Question

提问by Mark

What would be easiest method to split a string using c++11?

使用 c++11 拆分字符串的最简单方法是什么？

I've seen the method used by this post, but I feel that there ought to be a less verbose way of doing it using the new standard.

我看过这篇文章使用的方法，但我觉得使用新标准应该有一种不那么冗长的方法。

Edit: I would like to have a vector<string>as a result and be able to delimitate on a single character.

编辑：我希望有一个vector<string>结果并且能够分隔单个字符。

Answer 1

回答by JohannesD

std::regex_token_iteratorperforms generic tokenization based on a regex. It may or may not be overkill for doing simple splitting on a single character, but it works and is not too verbose:

std::regex_token_iterator基于正则表达式执行通用标记化。对单个字符进行简单拆分可能会也可能不会过度，但它可以工作并且不会太冗长：

std::vector<std::string> split(const string& input, const string& regex) {
    // passing -1 as the submatch index parameter performs splitting
    std::regex re(regex);
    std::sregex_token_iterator
        first{input.begin(), input.end(), re, -1},
        last;
    return {first, last};
}

Answer 2

回答by Yaguang

Here is a (maybe less verbose) way to split string (based on the postyou mentioned).

这是一种（可能不那么冗长）拆分字符串的方法（基于您提到的帖子）。

#include <string>
#include <sstream>
#include <vector>
std::vector<std::string> split(const std::string &s, char delim) {
  std::stringstream ss(s);
  std::string item;
  std::vector<std::string> elems;
  while (std::getline(ss, item, delim)) {
    elems.push_back(item);
    // elems.push_back(std::move(item)); // if C++11 (based on comment from @mchiasson)
  }
  return elems;
}

Answer 3

回答by fduff

Here's an example of splitting a string and populating a vector with the extracted elements using boost.

下面是一个使用boost.

#include <boost/algorithm/string.hpp>

std::string my_input("A,B,EE");
std::vector<std::string> results;

boost::algorithm::split(results, my_input, boost::is_any_of(","));

assert(results[0] == "A");
assert(results[1] == "B");
assert(results[2] == "EE");

Answer 4

回答by wally

Another regex solution inspired by other answersbut hopefully shorter and easier to read:

另一个受其他答案启发的正则表达式解决方案，但希望更短更容易阅读：

std::string s{"String to split here, and here, and here,..."};
std::regex regex{R"([\s,]+)"}; // split on space and comma
std::sregex_token_iterator it{s.begin(), s.end(), regex, -1};
std::vector<std::string> words{it, {}};

Answer 5

回答by Faisal Vali

I don't know if this is less verbose, but it might be easier to grok for those more seasoned in dynamic languages such as javascript. The only C++11 feature it uses is lambdas.

我不知道这是否不那么冗长，但对于那些在动态语言（例如 javascript）方面经验丰富的人来说，可能更容易理解。它使用的唯一 C++11 特性是 lambdas。

#include <algorithm>
#include <string>
#include <cctype>
#include <iostream>
#include <vector>

int main()
{
  using namespace std;
  string s = "hello  how    are you won't you tell me your name";
  vector<string> tokens;
  string token;

  for_each(s.begin(), s.end(), [&](char c) {
    if (!isspace(c))
        token += c;
    else 
    {
        if (token.length()) tokens.push_back(token);
        token.clear();
    }
  });
  if (token.length()) tokens.push_back(token);

  return 0;
}

Answer 6

回答by Torsten

My choice is boost::tokenizerbut I didn't have any heavy tasks and test with huge data. Example from boost doc with lambda modification:

我的选择是boost::tokenizer但我没有任何繁重的任务并使用大量数据进行测试。来自带有 lambda 修改的 boost 文档的示例：

#include <iostream>
#include <boost/tokenizer.hpp>
#include <string>
#include <vector>

int main()
{
   using namespace std;
   using namespace boost;

   string s = "This is,  a test";
   vector<string> v;
   tokenizer<> tok(s);
   for_each (tok.begin(), tok.end(), [&v](const string & s) { v.push_back(s); } );
   // result 4 items: 1)This 2)is 3)a 4)test
   return 0;
}

Answer 7

回答by chekkal

#include <iostream>
#include <algorithm>
#include <vector>
#include <string>


using namespace std;

vector<string> split(const string& str, int delimiter(int) = ::isspace){
  vector<string> result;
  auto e=str.end();
  auto i=str.begin();
  while(i!=e){
    i=find_if_not(i,e, delimiter);
    if(i==e) break;
    auto j=find_if(i,e, delimiter);
    result.push_back(string(i,j));
    i=j;
  }
  return result;
}

int main(){
  string line;
  getline(cin,line);
  vector<string> result = split(line);
  for(auto s: result){
    cout<<s<<endl;
  }
}

Answer 8

回答by ymmt2005

This is my answer. Verbose, readable and efficient.

这是我的回答。冗长，可读和高效。

std::vector<std::string> tokenize(const std::string& s, char c) {
    auto end = s.cend();
    auto start = end;

    std::vector<std::string> v;
    for( auto it = s.cbegin(); it != end; ++it ) {
        if( *it != c ) {
            if( start == end )
                start = it;
            continue;
        }
        if( start != end ) {
            v.emplace_back(start, it);
            start = end;
        }
    }
    if( start != end )
        v.emplace_back(start, end);
    return v;
}

Answer 9

回答by villains

Here is a C++11 solution that uses only std::string::find(). The delimiter can be any number of characters long. Parsed tokens are output via an output iterator, which is typically a std::back_inserter in my code.

这是一个仅使用 std::string::find() 的 C++11 解决方案。分隔符可以是任意数量的字符。解析的标记通过输出迭代器输出，在我的代码中它通常是 std::back_inserter。

I have not tested this with UTF-8, but I expect it should work as long as the input and delimiter are both valid UTF-8 strings.

我没有用 UTF-8 测试过这个，但我希望它应该可以工作，只要输入和分隔符都是有效的 UTF-8 字符串。

#include <string>

template<class Iter>
Iter splitStrings(const std::string &s, const std::string &delim, Iter out)
{
    if (delim.empty()) {
        *out++ = s;
        return out;
    }
    size_t a = 0, b = s.find(delim);
    for ( ; b != std::string::npos;
          a = b + delim.length(), b = s.find(delim, a))
    {
        *out++ = std::move(s.substr(a, b - a));
    }
    *out++ = std::move(s.substr(a, s.length() - a));
    return out;
}

Some test cases:

一些测试用例：

void test()
{
    std::vector<std::string> out;
    size_t counter;

    std::cout << "Empty input:" << std::endl;        
    out.clear();
    splitStrings("", ",", std::back_inserter(out));
    counter = 0;        
    for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
        std::cout << counter << ": " << *i << std::endl;
    }

    std::cout << "Non-empty input, empty delimiter:" << std::endl;        
    out.clear();
    splitStrings("Hello, world!", "", std::back_inserter(out));
    counter = 0;        
    for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
        std::cout << counter << ": " << *i << std::endl;
    }

    std::cout << "Non-empty input, non-empty delimiter"
                 ", no delimiter in string:" << std::endl;        
    out.clear();
    splitStrings("abxycdxyxydefxya", "xyz", std::back_inserter(out));
    counter = 0;        
    for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
        std::cout << counter << ": " << *i << std::endl;
    }

    std::cout << "Non-empty input, non-empty delimiter"
                 ", delimiter exists string:" << std::endl;        
    out.clear();
    splitStrings("abxycdxy!!xydefxya", "xy", std::back_inserter(out));
    counter = 0;        
    for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
        std::cout << counter << ": " << *i << std::endl;
    }

    std::cout << "Non-empty input, non-empty delimiter"
                 ", delimiter exists string"
                 ", input contains blank token:" << std::endl;        
    out.clear();
    splitStrings("abxycdxyxydefxya", "xy", std::back_inserter(out));
    counter = 0;        
    for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
        std::cout << counter << ": " << *i << std::endl;
    }

    std::cout << "Non-empty input, non-empty delimiter"
                 ", delimiter exists string"
                 ", nothing after last delimiter:" << std::endl;        
    out.clear();
    splitStrings("abxycdxyxydefxy", "xy", std::back_inserter(out));
    counter = 0;        
    for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
        std::cout << counter << ": " << *i << std::endl;
    }

    std::cout << "Non-empty input, non-empty delimiter"
                 ", only delimiter exists string:" << std::endl;        
    out.clear();
    splitStrings("xy", "xy", std::back_inserter(out));
    counter = 0;        
    for (auto i = out.begin(); i != out.end(); ++i, ++counter) {
        std::cout << counter << ": " << *i << std::endl;
    }
}

Expected output:

预期输出：

Empty input:
0: 
Non-empty input, empty delimiter:
0: Hello, world!
Non-empty input, non-empty delimiter, no delimiter in string:
0: abxycdxyxydefxya
Non-empty input, non-empty delimiter, delimiter exists string:
0: ab
1: cd
2: !!
3: def
4: a
Non-empty input, non-empty delimiter, delimiter exists string, input contains blank token:
0: ab
1: cd
2: 
3: def
4: a
Non-empty input, non-empty delimiter, delimiter exists string, nothing after last delimiter:
0: ab
1: cd
2: 
3: def
4: 
Non-empty input, non-empty delimiter, only delimiter exists string:
0: 
1:

Answer 10

回答by Bill Moore

#include <string>
#include <vector>
#include <sstream>

inline vector<string> split(const string& s) {
    vector<string> result;
    istringstream iss(s);
    for (string w; iss >> w; )
        result.push_back(w);
    return result;
}

使用 C++11 拆分字符串

提问by Mark

回答by JohannesD

回答by Yaguang

回答by fduff

回答by wally

回答by Faisal Vali

回答by Torsten

回答by chekkal

回答by ymmt2005

回答by villains

回答by Bill Moore

相关推荐

最近更新

标签

使用 C++11 拆分字符串

提问by Mark

回答by JohannesD

回答by Yaguang

回答by fduff

回答by wally

回答by Faisal Vali

回答by Torsten

回答by chekkal

回答by ymmt2005

回答by villains

回答by Bill Moore

相关推荐

C++ 初始化对象时 {0} 是什么意思？

C++ 一个字符串占用多少字节？一个字符？

适用于 Windows 的最佳 C++ IDE 或编辑器

游戏编程库 C++

相关推荐

最近更新

标签