C++ 如何遍历字符串的单词?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/236129/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I iterate over the words of a string?
提问by Ashwin Nanjappa
I'm trying to iterate over the words of a string.
我正在尝试遍历字符串的单词。
The string can be assumed to be composed of words separated by whitespace.
可以假设该字符串由由空格分隔的单词组成。
Note that I'm not interested in C string functions or that kind of character manipulation/access. Also, please give precedence to elegance over efficiency in your answer.
请注意,我对 C 字符串函数或那种字符操作/访问不感兴趣。另外,请在您的回答中优先考虑优雅而不是效率。
The best solution I have right now is:
我现在最好的解决方案是:
#include <iostream>
#include <sstream>
#include <string>
using namespace std;
int main()
{
string s = "Somewhere down the road";
istringstream iss(s);
do
{
string subs;
iss >> subs;
cout << "Substring: " << subs << endl;
} while (iss);
}
Is there a more elegant way to do this?
有没有更优雅的方法来做到这一点?
采纳答案by Zunino
For what it's worth, here's another way to extract tokens from an input string, relying only on standard library facilities. It's an example of the power and elegance behind the design of the STL.
值得一提的是,这是从输入字符串中提取标记的另一种方法,仅依赖于标准库设施。这是 STL 设计背后的力量和优雅的一个例子。
#include <iostream>
#include <string>
#include <sstream>
#include <algorithm>
#include <iterator>
int main() {
using namespace std;
string sentence = "And I feel fine...";
istringstream iss(sentence);
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
ostream_iterator<string>(cout, "\n"));
}
Instead of copying the extracted tokens to an output stream, one could insert them into a container, using the same generic copy
algorithm.
不是将提取的标记复制到输出流,而是可以使用相同的通用copy
算法将它们插入到容器中。
vector<string> tokens;
copy(istream_iterator<string>(iss),
istream_iterator<string>(),
back_inserter(tokens));
... or create the vector
directly:
...或vector
直接创建:
vector<string> tokens{istream_iterator<string>{iss},
istream_iterator<string>{}};
回答by Evan Teran
I use this to split string by a delimiter. The first puts the results in a pre-constructed vector, the second returns a new vector.
我用它来用分隔符分割字符串。第一个将结果放入预先构造的向量中,第二个返回一个新向量。
#include <string>
#include <sstream>
#include <vector>
#include <iterator>
template <typename Out>
void split(const std::string &s, char delim, Out result) {
std::istringstream iss(s);
std::string item;
while (std::getline(iss, item, delim)) {
*result++ = item;
}
}
std::vector<std::string> split(const std::string &s, char delim) {
std::vector<std::string> elems;
split(s, delim, std::back_inserter(elems));
return elems;
}
Note that this solution does not skip empty tokens, so the following will find 4 items, one of which is empty:
请注意,此解决方案不会跳过空标记,因此以下将找到 4 个项目,其中一个为空:
std::vector<std::string> x = split("one:two::three", ':');
回答by ididak
A possible solution using Boost might be:
使用 Boost 的可能解决方案可能是:
#include <boost/algorithm/string.hpp>
std::vector<std::string> strs;
boost::split(strs, "string to split", boost::is_any_of("\t "));
This approach might be even faster than the stringstream
approach. And since this is a generic template function it can be used to split other types of strings (wchar, etc. or UTF-8) using all kinds of delimiters.
这种方法可能比stringstream
方法更快。由于这是一个通用模板函数,它可用于使用各种分隔符拆分其他类型的字符串(wchar 等或 UTF-8)。
See the documentationfor details.
有关详细信息,请参阅文档。
回答by kev
#include <vector>
#include <string>
#include <sstream>
int main()
{
std::string str("Split me by whitespaces");
std::string buf; // Have a buffer string
std::stringstream ss(str); // Insert the string into a stream
std::vector<std::string> tokens; // Create vector to hold our words
while (ss >> buf)
tokens.push_back(buf);
return 0;
}
回答by Marius
For those with whom it does not sit well to sacrifice all efficiency for code size and see "efficient" as a type of elegance, the following should hit a sweet spot (and I think the template container class is an awesomely elegant addition.):
对于那些不愿意为了代码大小而牺牲所有效率并将“高效”视为一种优雅的人来说,以下应该达到最佳效果(我认为模板容器类是一个非常优雅的补充。):
template < class ContainerT >
void tokenize(const std::string& str, ContainerT& tokens,
const std::string& delimiters = " ", bool trimEmpty = false)
{
std::string::size_type pos, lastPos = 0, length = str.length();
using value_type = typename ContainerT::value_type;
using size_type = typename ContainerT::size_type;
while(lastPos < length + 1)
{
pos = str.find_first_of(delimiters, lastPos);
if(pos == std::string::npos)
{
pos = length;
}
if(pos != lastPos || !trimEmpty)
tokens.push_back(value_type(str.data()+lastPos,
(size_type)pos-lastPos ));
lastPos = pos + 1;
}
}
I usually choose to use std::vector<std::string>
types as my second parameter (ContainerT
)... but list<>
is way faster than vector<>
for when direct access is not needed, and you can even create your own string class and use something like std::list<subString>
where subString
does not do any copies for incredible speed increases.
我通常选择使用std::vector<std::string>
类型作为我的第二个参数 ( ContainerT
)...但list<>
比vector<>
不需要直接访问时要快得多,您甚至可以创建自己的字符串类并使用诸如std::list<subString>
wheresubString
不做任何副本之类的东西以达到令人难以置信的速度增加。
It's more than double as fast as the fastest tokenize on this page and almost 5 times faster than some others. Also with the perfect parameter types you can eliminate all string and list copies for additional speed increases.
它是此页面上最快标记化的两倍多,比其他一些标记快近 5 倍。此外,使用完美的参数类型,您可以消除所有字符串和列表副本以提高速度。
Additionally it does not do the (extremely inefficient) return of result, but rather it passes the tokens as a reference, thus also allowing you to build up tokens using multiple calls if you so wished.
此外,它不会执行(极其低效的)结果返回,而是将令牌作为参考传递,因此还允许您根据需要使用多个调用来构建令牌。
Lastly it allows you to specify whether to trim empty tokens from the results via a last optional parameter.
最后,它允许您通过最后一个可选参数指定是否从结果中修剪空标记。
All it needs is std::string
... the rest are optional. It does not use streams or the boost library, but is flexible enough to be able to accept some of these foreign types naturally.
它所需要的只是std::string
......其余的都是可选的。它不使用流或 boost 库,但足够灵活,能够自然地接受这些外部类型中的一些。
回答by Alec Thomas
Here's another solution. It's compact and reasonably efficient:
这是另一个解决方案。它紧凑且高效:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
tokens.push_back(text.substr(start, end - start));
start = end + 1;
}
tokens.push_back(text.substr(start));
return tokens;
}
It can easily be templatised to handle string separators, wide strings, etc.
它可以很容易地被模板化来处理字符串分隔符、宽字符串等。
Note that splitting ""
results in a single empty string and splitting ","
(ie. sep) results in two empty strings.
请注意,拆分会""
导致单个空字符串,而拆分","
(即 sep)会导致两个空字符串。
It can also be easily expanded to skip empty tokens:
它还可以轻松扩展以跳过空令牌:
std::vector<std::string> split(const std::string &text, char sep) {
std::vector<std::string> tokens;
std::size_t start = 0, end = 0;
while ((end = text.find(sep, start)) != std::string::npos) {
if (end != start) {
tokens.push_back(text.substr(start, end - start));
}
start = end + 1;
}
if (end != start) {
tokens.push_back(text.substr(start));
}
return tokens;
}
If splitting a string at multiple delimiters while skipping empty tokens is desired, this version may be used:
如果需要在跳过空标记的同时在多个分隔符处拆分字符串,则可以使用此版本:
std::vector<std::string> split(const std::string& text, const std::string& delims)
{
std::vector<std::string> tokens;
std::size_t start = text.find_first_not_of(delims), end = 0;
while((end = text.find_first_of(delims, start)) != std::string::npos)
{
tokens.push_back(text.substr(start, end - start));
start = text.find_first_not_of(delims, end);
}
if(start != std::string::npos)
tokens.push_back(text.substr(start));
return tokens;
}
回答by gnomed
This is my favorite way to iterate through a string. You can do whatever you want per word.
这是我最喜欢的遍历字符串的方式。你可以为每个字做任何你想做的事。
string line = "a line of text to iterate through";
string word;
istringstream iss(line, istringstream::in);
while( iss >> word )
{
// Do something on `word` here...
}
回答by Ferruccio
This is similar to Stack Overflow question How do I tokenize a string in C++?.
这类似于 Stack Overflow 问题How do I tokenize a string in C++? .
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;
int main(int argc, char** argv)
{
string text = "token test\tstring";
char_separator<char> sep(" \t");
tokenizer<char_separator<char>> tokens(text, sep);
for (const string& t : tokens)
{
cout << t << "." << endl;
}
}
回答by Shadow2531
I like the following because it puts the results into a vector, supports a string as a delim and gives control over keeping empty values. But, it doesn't look as good then.
我喜欢以下内容,因为它将结果放入向量中,支持字符串作为 delim 并控制保留空值。但是,它看起来并不那么好。
#include <ostream>
#include <string>
#include <vector>
#include <algorithm>
#include <iterator>
using namespace std;
vector<string> split(const string& s, const string& delim, const bool keep_empty = true) {
vector<string> result;
if (delim.empty()) {
result.push_back(s);
return result;
}
string::const_iterator substart = s.begin(), subend;
while (true) {
subend = search(substart, s.end(), delim.begin(), delim.end());
string temp(substart, subend);
if (keep_empty || !temp.empty()) {
result.push_back(temp);
}
if (subend == s.end()) {
break;
}
substart = subend + delim.size();
}
return result;
}
int main() {
const vector<string> words = split("So close no matter how far", " ");
copy(words.begin(), words.end(), ostream_iterator<string>(cout, "\n"));
}
Of course, Boost has a split()
that works partially like that. And, if by 'white-space', you really do mean any type of white-space, using Boost's split with is_any_of()
works great.
当然,Boost 有一个split()
部分是这样工作的。而且,如果说“空白”,您确实是指任何类型的空白,那么使用 Boost 的拆分is_any_of()
效果很好。
回答by Shadow2531
The STL does not have such a method available already.
STL 还没有这样的方法可用。
However, you can either use C's strtok()
function by using the std::string::c_str()
member, or you can write your own. Here is a code sample I found after a quick Google search ("STL string split"):
但是,您可以strtok()
通过使用std::string::c_str()
成员来使用 C 的函数,也可以编写自己的函数。这是我在 Google 快速搜索(“STL string split”)后找到的代码示例:
void Tokenize(const string& str,
vector<string>& tokens,
const string& delimiters = " ")
{
// Skip delimiters at beginning.
string::size_type lastPos = str.find_first_not_of(delimiters, 0);
// Find first "non-delimiter".
string::size_type pos = str.find_first_of(delimiters, lastPos);
while (string::npos != pos || string::npos != lastPos)
{
// Found a token, add it to the vector.
tokens.push_back(str.substr(lastPos, pos - lastPos));
// Skip delimiters. Note the "not_of"
lastPos = str.find_first_not_of(delimiters, pos);
// Find next "non-delimiter"
pos = str.find_first_of(delimiters, lastPos);
}
}
Taken from: http://oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html
摘自:http: //oopweb.com/CPP/Documents/CPPHOWTO/Volume/C++Programming-HOWTO-7.html
If you have questions about the code sample, leave a comment and I will explain.
如果您对代码示例有疑问,请发表评论,我会解释。
And just because it does not implement a typedef
called iterator or overload the <<
operator does not mean it is bad code. I use C functions quite frequently. For example, printf
and scanf
both are faster than std::cin
and std::cout
(significantly), the fopen
syntax is a lot more friendly for binary types, and they also tend to produce smaller EXEs.
并且仅仅因为它没有实现typedef
被调用的迭代器或<<
运算符重载并不意味着它是糟糕的代码。我经常使用 C 函数。例如,printf
并且scanf
都是快于std::cin
和std::cout
(显著),该fopen
语法是二进制类型很多更加友好,而且他们也往往会产生更小的EXE文件。
Don't get sold on this "Elegance over performance"deal.
不要在这种“优雅胜过性能”的交易中出卖。