C++ 使用正则表达式标记字符串

Question

提问by sth

I'm trying to learn myself some C++ from scratch at the moment.
I'm well-versed in python, perl, javascript but have only encountered C++ briefly, in a classroom setting in the past. Please excuse the naivete of my question.

我现在正在尝试从头开始学习一些 C++。
我精通 python、perl、javascript，但过去只在课堂环境中短暂接触过 C++。请原谅我的问题的幼稚。

I would like to split a string using a regular expression but have not had much luck finding a clear, definitive, efficient and complete example of how to do this in C++.

我想使用正则表达式拆分一个字符串，但没有找到一个清晰、明确、有效和完整的示例来说明如何在 C++ 中执行此操作。

In perl this is action is common, and thus can be accomplished in a trivial manner,

在 perl 中，这是一个常见的动作，因此可以以一种简单的方式完成，

/home/me$ cat test.txt
this is  aXstringYwith, some problems
and anotherXY line with   similar issues

/home/me$ cat test.txt | perl -e'
> while(<>){
>   my @toks = split(/[\sXY,]+/);
>   print join(" ",@toks)."\n";
> }'
this is a string with some problems
and another line with similar issues

I'd like to know how best to accomplish the equivalent in C++.

我想知道如何最好地完成 C++ 中的等价物。

EDIT:
I think I found what I was looking for in the boost library, as mentioned below.

编辑：
我想我在 boost 库中找到了我想要的东西，如下所述。

boost regex-token-iterator(why don't underscores work?)

boost regex-token-iterator（为什么下划线不起作用？）

I guess I didn't know what to search for.

我想我不知道要搜索什么。


#include <iostream>
#include <boost/regex.hpp>

using namespace std;

int main(int argc)
{
  string s;
  do{
    if(argc == 1)
      {
        cout << "Enter text to split (or \"quit\" to exit): ";
        getline(cin, s);
        if(s == "quit") break;
      }
    else
      s = "This is a string of tokens";

    boost::regex re("\s+");
    boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
    boost::sregex_token_iterator j;

    unsigned count = 0;
    while(i != j)
      {
        cout << *i++ << endl;
        count++;
      }
    cout << "There were " << count << " tokens found." << endl;

  }while(argc == 1);
  return 0;
}

Answer 1

回答by sth

The boost libraries are usually a good choice, in this case Boost.Regex. There even is an examplefor splitting a string into tokens that already does what you want. Basically it comes down to something like this:

boost 库通常是一个不错的选择，在这种情况下是Boost.Regex。甚至还有一个示例，用于将字符串拆分为已经执行您想要的操作的标记。基本上它归结为这样的：

boost::regex re("[\sXY]+");
std::string s;

while (std::getline(std::cin, s)) {
  boost::sregex_token_iterator i(s.begin(), s.end(), re, -1);
  boost::sregex_token_iterator j;
  while (i != j) {
     std::cout << *i++ << " ";
  }
  std::cout << std::endl;
}

Answer 2

回答by Faisal Vali

If you want to minimize use of iterators, and pithify your code, the following should work:

如果你想尽量减少迭代器的使用，并简化你的代码，以下应该有效：

#include <string>
#include <iostream>
#include <boost/regex.hpp>

int main()
{
  const boost::regex re("[\sXY,]+");

  for (std::string s; std::getline(std::cin, s); ) 
  {
    std::cout << regex_replace(s, re, " ") << std::endl;   
  }

}

Answer 3

回答by anno

Regex are part of TR1 included in Visual C++ 2008 SP1 (including express edition) and G++ 4.3.

正则表达式是包含在 Visual C++ 2008 SP1（包括快速版）和 G++ 4.3 中的 TR1 的一部分。

Header is <regex>and namespace std::tr1. Works great with STL.

标头是<regex>和命名空间 std::tr1。与 STL 配合使用效果很好。

Getting started with C++ TR1 regular expressions

C++ TR1 正则表达式入门

Visual C++ Standard Library : TR1 Regular Expressions

Visual C++ 标准库：TR1 正则表达式

Answer 4

回答by Employed Russian

Unlike in Perl, regular expressions are not "built in" into C++.

与 Perl 不同，正则表达式不是“内置”到 C++ 中的。

You need to use an external library, such as PCRE.

您需要使用外部库，例如PCRE。

C++ 使用正则表达式标记字符串

提问by sth

回答by sth

回答by Faisal Vali

回答by anno

回答by Employed Russian

相关推荐

最近更新

标签

C++ 使用正则表达式标记字符串

提问by sth

回答by sth

回答by Faisal Vali

回答by anno

回答by Employed Russian

相关推荐

在 C++11 中声明接口的最佳方式

C++ SFINAE 示例？

C++ 对静态变量的未定义引用

在循环中声明变量是否有任何开销？(C++)

相关推荐

最近更新

标签