正则表达式 C++:提取子字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11627440/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-27 15:22:10  来源:igfitidea点击:

Regex C++: extract substring

c++regex

提问by eouti

I would like to extract a substring between two others.
ex: /home/toto/FILE_mysymbol_EVENT.DAT
or just FILE_othersymbol_EVENT.DAT
And I would like to get : mysymboland othersymbol

我想在另外两个之间提取一个子字符串。
例如:/home/toto/FILE_mysymbol_EVENT.DAT
或者只是FILE_othersymbol_EVENT.DAT
我想得到:mysymbolothersymbol

I don't want to use boost or other libs. Just standard stuffs from C++, except CERN's ROOT lib, with TRegexp, but I don't know how to use it...

我不想使用 boost 或其他库。只是来自 C++ 的标准东西,除了 CERN 的 ROOT 库,带有TRegexp,但我不知道如何使用它......

回答by Some programmer dude

Since last year C++ has regular expression built into the standard. This program will show how to use them to extract the string you are after:

自去年以来,C++ 已将正则表达式内置到标准中。该程序将展示如何使用它们来提取您想要的字符串:

#include <regex>
#include <iostream>

int main()
{
    const std::string s = "/home/toto/FILE_mysymbol_EVENT.DAT";
    std::regex rgx(".*FILE_(\w+)_EVENT\.DAT.*");
    std::smatch match;

    if (std::regex_search(s.begin(), s.end(), match, rgx))
        std::cout << "match: " << match[1] << '\n';
}

It will output:

它会输出:

match: mysymbol

It should be noted though, that it will not work in GCC as its library support for regular expression is not very good. Works well in VS2010 (and probably VS2012), and should work in clang.

但是应该注意的是,它在 GCC 中不起作用,因为它对正则表达式的库支持不是很好。在 VS2010(也可能是 VS2012)中运行良好,并且应该在 clang 中运行。



By now (late 2016) all modern C++ compilers and their standard libraries are fully up to date with the C++11 standard, and most if not all of C++14 as well. GCC 6 and the upcoming Clang 4 support most of the coming C++17 standard as well.

到目前为止(2016 年底),所有现代 C++ 编译器及其标准库都与 C++11 标准完全同步,大多数(如果不是全部)C++14 标准也是如此。GCC 6 和即将推出的 Clang 4 也支持大部分即将推出的 C++17 标准。

回答by Tim Pietzcker

TRegexp only supports a very limited subset of regular expressions compared to other regex flavors. This makes constructing a single regex that suits your needs somewhat awkward.

与其他正则表达式风格相比,TRegexp 仅支持非常有限的正则表达式子集。这使得构建适合您需求的单个正则表达式有些尴尬。

One possible solution:

一种可能的解决方案:

[^_]*_([^_]*)_

will match the string until the first underscore, then capture all characters until the next underscore. The relevant result of the match is then found in group number 1.

将匹配字符串直到第一个下划线,然后捕获所有字符直到下一个下划线。然后在第 1 组中找到匹配的相关结果。

But in your case, why use a regex at all? Just find the first and second occurrence of your delimiter _in the string and extract the characters between those positions.

但在你的情况下,为什么要使用正则表达式?只需_在字符串中找到分隔符的第一次和第二次出现,并提取这些位置之间的字符。

回答by Christopher Creutzig

If you want to use regular expressions, I'd really recommend using C++11's regexes or, if you have a compiler that doesn't yet support them, Boost. Boost is something I consider almost-part-of-standard-C++.

如果您想使用正则表达式,我真的建议您使用 C++11 的正则表达式,或者,如果您的编译器尚不支持它们,则使用 Boost。Boost 是我认为几乎是标准 C++ 的一部分。

But for this particular question, you do not really need any form of regular expressions. Something like this sketch should work just fine, after you add all appropriate error checks (beg != npos, end != nposetc.), test code, and remove my typos:

但是对于这个特定的问题,您实际上并不需要任何形式的正则表达式。像这样的素描的东西应该工作就好了,你添加所有适当的错误检查(后beg != nposend != npos等),测试代码,并删除我的错别字:

std::string between(std::string const &in,
                    std::string const &before, std::string const &after) {
  size_type beg = in.find(before);
  beg += before.size();
  size_type end = in.find(after, beg);
  return in.substr(beg, end-beg);
}

Obviously, you could change the std::stringto a template parameter and it should work just fine with std::wstringor more seldomly used instantiations of std::basic_stringas well.

显然,您可以将 更改std::string为模板参数,它应该可以很好地使用std::wstring或更不常用的实例化std::basic_string

回答by oguz

I would study corner cases before trusting it, but

我会在信任它之前研究极端案例,但是

   std::string text = "/home/toto/FILE_mysymbol_EVENT.DAT";
   std::regex re("(.*)(FILE_)(.*)(_EVENT.DAT)(.*)");
   std::cout << std::regex_replace(text, re, "") << '\n';

is a good candidate.

是一个很好的候选人。