C++ 使用正则表达式标记字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/992176/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
C++ tokenize a string using a regular expression
提问by sth
I'm trying to learn myself some C++ from scratch at the moment.
I'm well-versed in python, perl, javascript but have only encountered C++ briefly, in a
classroom setting in the past. Please excuse the naivete of my question.
我现在正在尝试从头开始学习一些 C++。
我精通 python、perl、javascript,但过去只在课堂环境中短暂接触过 C++。请原谅我的问题的幼稚。
I would like to split a string using a regular expression but have not had much luck finding a clear, definitive, efficient and complete example of how to do this in C++.
我想使用正则表达式拆分一个字符串,但没有找到一个清晰、明确、有效和完整的示例来说明如何在 C++ 中执行此操作。
In perl this is action is common, and thus can be accomplished in a trivial manner,
在 perl 中,这是一个常见的动作,因此可以以一种简单的方式完成,
/home/me$ cat test.txt
this is aXstringYwith, some problems
and anotherXY line with similar issues
/home/me$ cat test.txt | perl -e'
> while(<>){
> my @toks = split(/[\sXY,]+/);
> print join(" ",@toks)."\n";
> }'
this is a string with some problems
and another line with similar issues
I'd like to know how best to accomplish the equivalent in C++.
我想知道如何最好地完成 C++ 中的等价物。
EDIT:
I think I found what I was looking for in the boost library, as mentioned below.
编辑:
我想我在 boost 库中找到了我想要的东西,如下所述。
boost regex-token-iterator(why don't underscores work?)
boost regex-token-iterator(为什么下划线不起作用?)
I guess I didn't know what to search for.
我想我不知道要搜索什么。
#include <iostream>
#include <boost/regex.hpp>
using namespace std;
int main(int argc)
{
string s;
do{
if(argc == 1)
{
cout << "Enter text to split (or \"quit\" to exit): ";
getline(cin, s);
if(s == "quit") break;
}
else
s = "This is a string of tokens";
boost::regex re("\s+");
boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
boost::sregex_token_iterator j;
unsigned count = 0;
while(i != j)
{
cout << *i++ << endl;
count++;
}
cout << "There were " << count << " tokens found." << endl;
}while(argc == 1);
return 0;
}
回答by sth
The boost libraries are usually a good choice, in this case Boost.Regex. There even is an examplefor splitting a string into tokens that already does what you want. Basically it comes down to something like this:
boost 库通常是一个不错的选择,在这种情况下是Boost.Regex。甚至还有一个示例,用于将字符串拆分为已经执行您想要的操作的标记。基本上它归结为这样的:
boost::regex re("[\sXY]+");
std::string s;
while (std::getline(std::cin, s)) {
boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
boost::sregex_token_iterator j;
while (i != j) {
std::cout << *i++ << " ";
}
std::cout << std::endl;
}
回答by Faisal Vali
If you want to minimize use of iterators, and pithify your code, the following should work:
如果你想尽量减少迭代器的使用,并简化你的代码,以下应该有效:
#include <string>
#include <iostream>
#include <boost/regex.hpp>
int main()
{
const boost::regex re("[\sXY,]+");
for (std::string s; std::getline(std::cin, s); )
{
std::cout << regex_replace(s, re, " ") << std::endl;
}
}
回答by anno
Regex are part of TR1 included in Visual C++ 2008 SP1 (including express edition) and G++ 4.3.
正则表达式是包含在 Visual C++ 2008 SP1(包括快速版)和 G++ 4.3 中的 TR1 的一部分。
Header is <regex>
and namespace std::tr1. Works great with STL.
标头是<regex>
和命名空间 std::tr1。与 STL 配合使用效果很好。
Getting started with C++ TR1 regular expressions