如何在 C++ 中标记字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/53849/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I tokenize a string in C++?
提问by Bill the Lizard
Java has a convenient split method:
Java有一个方便的split方法:
String str = "The quick brown fox";
String[] results = str.split(" ");
Is there an easy way to do this in C++?
有没有一种简单的方法可以在 C++ 中做到这一点?
采纳答案by Konrad Rudolph
C++ standard library algorithms are pretty universally based around iterators rather than concrete containers. Unfortunately this makes it hard to provide a Java-like split
function in the C++ standard library, even though nobody argues that this would be convenient. But what would its return type be? std::vector<std::basic_string<…>>
? Maybe, but then we're forced to perform (potentially redundant and costly) allocations.
C++ 标准库算法非常普遍地基于迭代器而不是具体的容器。不幸的是,这使得很难split
在 C++ 标准库中提供类似 Java 的函数,即使没有人认为这会很方便。但是它的返回类型是什么?std::vector<std::basic_string<…>>
? 也许吧,但是我们被迫执行(可能是冗余和昂贵的)分配。
Instead, C++ offers a plethora of ways to split strings based on arbitrarily complex delimiters, but none of them is encapsulated as nicely as in other languages. The numerous ways fill whole blog posts.
相反,C++ 提供了大量基于任意复杂的分隔符拆分字符串的方法,但没有一种方法像其他语言那样封装得很好。多种方式填满整个博客文章。
At its simplest, you could iterate using std::string::find
until you hit std::string::npos
, and extract the contents using std::string::substr
.
最简单的是,您可以迭代 usingstd::string::find
直到点击std::string::npos
,然后使用 提取内容std::string::substr
。
A more fluid (and idiomatic, but basic) version for splitting on whitespace would use a std::istringstream
:
一个更流畅(和惯用,但基本)的空格分割版本将使用std::istringstream
:
auto iss = std::istringstream{"The quick brown fox"};
auto str = std::string{};
while (iss >> str) {
process(str);
}
Using std::istream_iterator
s, the contents of the string stream could also be copied into a vector using its iterator range constructor.
使用std::istream_iterator
s,还可以使用其迭代器范围构造函数将字符串流的内容复制到向量中。
Multiple libraries (such as Boost.Tokenizer) offer specific tokenisers.
多个库(例如Boost.Tokenizer)提供特定的标记器。
More advanced splitting require regular expressions. C++ provides the std::regex_token_iterator
for this purpose in particular:
更高级的拆分需要正则表达式。C++std::regex_token_iterator
专门为此目的提供了:
auto const str = "The quick brown fox"s;
auto const re = std::regex{R"(\s+)"};
auto const vec = std::vector<std::string>(
std::sregex_token_iterator{begin(str), end(str), re, -1},
std::sregex_token_iterator{}
);
回答by Ferruccio
The Boost tokenizerclass can make this sort of thing quite simple:
该升压标记生成器类可以使这种相当简单的事情:
#include <iostream>
#include <string>
#include <boost/foreach.hpp>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;
int main(int, char**)
{
string text = "token, test string";
char_separator<char> sep(", ");
tokenizer< char_separator<char> > tokens(text, sep);
BOOST_FOREACH (const string& t, tokens) {
cout << t << "." << endl;
}
}
Updated for C++11:
为 C++11 更新:
#include <iostream>
#include <string>
#include <boost/tokenizer.hpp>
using namespace std;
using namespace boost;
int main(int, char**)
{
string text = "token, test string";
char_separator<char> sep(", ");
tokenizer<char_separator<char>> tokens(text, sep);
for (const auto& t : tokens) {
cout << t << "." << endl;
}
}
回答by Adam Pierce
Here's a real simple one:
这是一个非常简单的:
#include <vector>
#include <string>
using namespace std;
vector<string> split(const char *str, char c = ' ')
{
vector<string> result;
do
{
const char *begin = str;
while(*str != c && *str)
str++;
result.push_back(string(begin, str));
} while (0 != *str++);
return result;
}
回答by Mark
Use strtok. In my opinion, there isn't a need to build a class around tokenizing unless strtok doesn't provide you with what you need. It might not, but in 15+ years of writing various parsing code in C and C++, I've always used strtok. Here is an example
使用 strtok。在我看来,除非 strtok 没有为您提供所需的内容,否则没有必要围绕标记化构建一个类。可能不是,但在 15 多年的 C 和 C++ 中编写各种解析代码的过程中,我一直使用 strtok。这是一个例子
char myString[] = "The quick brown fox";
char *p = strtok(myString, " ");
while (p) {
printf ("Token: %s\n", p);
p = strtok(NULL, " ");
}
A few caveats (which might not suit your needs). The string is "destroyed" in the process, meaning that EOS characters are placed inline in the delimter spots. Correct usage might require you to make a non-const version of the string. You can also change the list of delimiters mid parse.
一些注意事项(可能不适合您的需求)。该字符串在此过程中被“销毁”,这意味着 EOS 字符被内联放置在分隔符点中。正确使用可能需要您制作字符串的非常量版本。您还可以在解析中更改分隔符列表。
In my own opinion, the above code is far simpler and easier to use than writing a separate class for it. To me, this is one of those functions that the language provides and it does it well and cleanly. It's simply a "C based" solution. It's appropriate, it's easy, and you don't have to write a lot of extra code :-)
在我个人看来,上面的代码比为它编写一个单独的类要简单得多,也更容易使用。对我来说,这是该语言提供的功能之一,它做得很好而且很干净。这只是一个“基于 C”的解决方案。很合适,很简单,不用写很多额外的代码:-)
回答by user35978
Another quick way is to use getline
. Something like:
另一种快速的方法是使用getline
. 就像是:
stringstream ss("bla bla");
string s;
while (getline(ss, s, ' ')) {
cout << s << endl;
}
If you want, you can make a simple split()
method returning a vector<string>
, which is
really useful.
如果你愿意,你可以制作一个split()
返回 a的简单方法vector<string>
,这真的很有用。
回答by KeithB
You can use streams, iterators, and the copy algorithm to do this fairly directly.
您可以使用流、迭代器和复制算法来相当直接地执行此操作。
#include <string>
#include <vector>
#include <iostream>
#include <istream>
#include <ostream>
#include <iterator>
#include <sstream>
#include <algorithm>
int main()
{
std::string str = "The quick brown fox";
// construct a stream from the string
std::stringstream strstr(str);
// use stream iterators to copy the stream to the vector as whitespace separated strings
std::istream_iterator<std::string> it(strstr);
std::istream_iterator<std::string> end;
std::vector<std::string> results(it, end);
// send the vector to stdout.
std::ostream_iterator<std::string> oit(std::cout);
std::copy(results.begin(), results.end(), oit);
}
回答by Mr.Ree
No offense folks, but for such a simple problem, you are making things waytoo complicated. There are a lot of reasons to use Boost. But for something this simple, it's like hitting a fly with a 20# sledge.
没有进攻的乡亲,但对于这样一个简单的问题,你在做事情的方式太复杂了。使用Boost的原因有很多。但对于这么简单的事情,就像用 20# 雪橇打苍蝇一样。
void
split( vector<string> & theStringVector, /* Altered/returned value */
const string & theString,
const string & theDelimiter)
{
UASSERT( theDelimiter.size(), >, 0); // My own ASSERT macro.
size_t start = 0, end = 0;
while ( end != string::npos)
{
end = theString.find( theDelimiter, start);
// If at end, use length=maxLength. Else use length=end-start.
theStringVector.push_back( theString.substr( start,
(end == string::npos) ? string::npos : end - start));
// If at end, use start=maxSize. Else use start=end+delimiter.
start = ( ( end > (string::npos - theDelimiter.size()) )
? string::npos : end + theDelimiter.size());
}
}
For example (for Doug's case),
例如(对于道格的情况),
#define SHOW(I,X) cout << "[" << (I) << "]\t " # X " = \"" << (X) << "\"" << endl
int
main()
{
vector<string> v;
split( v, "A:PEP:909:Inventory Item", ":" );
for (unsigned int i = 0; i < v.size(); i++)
SHOW( i, v[i] );
}
And yes, we could have split() return a new vector rather than passing one in. It's trivial to wrap and overload. But depending on what I'm doing, I often find it better to re-use pre-existing objects rather than always creating new ones. (Just as long as I don't forget to empty the vector in between!)
是的,我们可以让 split() 返回一个新向量而不是传入一个向量。包装和重载是微不足道的。但是根据我在做什么,我经常发现重用预先存在的对象比总是创建新对象更好。(只要我不忘记清空它们之间的向量!)
Reference: http://www.cplusplus.com/reference/string/string/.
参考:http: //www.cplusplus.com/reference/string/string/。
(I was originally writing a response to Doug's question: C++ Strings Modifying and Extracting based on Separators (closed). But since Martin York closed that question with a pointer over here... I'll just generalize my code.)
(我最初是在写对 Doug 问题的回复:C++ Strings Modifying and Extracting based on Separators (closed)。但由于 Martin York 在这里用一个指针结束了这个问题......我将概括我的代码。)
回答by w.b
A solution using regex_token_iterator
s:
使用regex_token_iterator
s的解决方案:
#include <iostream>
#include <regex>
#include <string>
using namespace std;
int main()
{
string str("The quick brown fox");
regex reg("\s+");
sregex_token_iterator iter(str.begin(), str.end(), reg, -1);
sregex_token_iterator end;
vector<string> vec(iter, end);
for (auto a : vec)
{
cout << a << endl;
}
}
回答by Raz
Boosthas a strong split function: boost::algorithm::split.
Boost有一个强大的拆分功能:boost::algorithm::split。
Sample program:
示例程序:
#include <vector>
#include <boost/algorithm/string.hpp>
int main() {
auto s = "a,b, c ,,e,f,";
std::vector<std::string> fields;
boost::split(fields, s, boost::is_any_of(","));
for (const auto& field : fields)
std::cout << "\"" << field << "\"\n";
return 0;
}
Output:
输出:
"a"
"b"
" c "
""
"e"
"f"
""
回答by sivabudh
I know you asked for a C++ solution, but you might consider this helpful:
我知道您要求提供 C++ 解决方案,但您可能认为这有帮助:
Qt
Qt
#include <QString>
...
QString str = "The quick brown fox";
QStringList results = str.split(" ");
The advantage over Boost in this example is that it's a direct one to one mapping to your post's code.
在此示例中,与 Boost 相比的优势在于,它与您的帖子代码直接一对一映射。
See more at Qt documentation