C++ gcc 4.8 或更早版本是否有关于正则表达式的错误?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12530406/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is gcc 4.8 or earlier buggy about regular expressions?
提问by tunnuz
I am trying to use std::regex in a C++11 piece of code, but it appears that the support is a bit buggy. An example:
我试图在一段 C++11 代码中使用 std::regex,但似乎支持有点问题。一个例子:
#include <regex>
#include <iostream>
int main (int argc, const char * argv[]) {
std::regex r("st|mt|tr");
std::cerr << "st|mt|tr" << " matches st? " << std::regex_match("st", r) << std::endl;
std::cerr << "st|mt|tr" << " matches mt? " << std::regex_match("mt", r) << std::endl;
std::cerr << "st|mt|tr" << " matches tr? " << std::regex_match("tr", r) << std::endl;
}
outputs:
输出:
st|mt|tr matches st? 1
st|mt|tr matches mt? 1
st|mt|tr matches tr? 0
when compiled with gcc (MacPorts gcc47 4.7.1_2) 4.7.1, either with
当使用 gcc (MacPorts gcc47 4.7.1_2) 4.7.1 编译时,要么使用
g++ *.cc -o test -std=c++11
g++ *.cc -o test -std=c++0x
or
或者
g++ *.cc -o test -std=gnu++0x
Besides, the regex works well if I only have two alternative patterns, e.g. st|mt
, so it looks like the last one is not matched for some reasons. The code works well with the Apple LLVM compiler.
此外,如果我只有两种替代模式,例如st|mt
,则正则表达式效果很好,因此由于某些原因,看起来最后一个不匹配。该代码适用于 Apple LLVM 编译器。
Any ideas about how to solve the issue?
关于如何解决问题的任何想法?
Updateone possible solution is to use groups to implement multiple alternatives, e.g. (st|mt)|tr
.
更新一种可能的解决方案是使用组来实现多个替代方案,例如(st|mt)|tr
。
回答by Jonathan Wakely
<regex>
was implemented and released in GCC 4.9.0.
<regex>
在 GCC 4.9.0 中实现和发布。
In your (older) version of GCC, it is not implemented.
在您(旧)版本的 GCC 中,它没有实现。
That prototype <regex>
code was added when all of GCC's C++0x support was highlyexperimental, tracking early C++0x drafts and being made available for people to experiment with. That allowed people to find problems and give feedback to the standard committee before the standard was finalised. At the time lots of people were grateful to have had access to bleeding edge features long before C++11 was finished and before many other compilers provided anysupport, and that feedback really helped improve C++11. This was a Good ThingTM.
<regex>
当 GCC 的所有 C++0x 支持都处于高度实验性、跟踪早期 C++0x 草案并可供人们进行试验时,添加了该原型代码。这允许人们在标准最终确定之前发现问题并向标准委员会提供反馈。当时,很多人都庆幸在 C++11 完成之前以及许多其他编译器提供任何支持之前就可以使用最前沿的特性,而这种反馈确实有助于改进 C++11。这是一件好事TM。
The <regex>
code was never in a useful state, but was added as a work-in-progress like many other bits of code at the time. It was checked in and made available for others to collaborate on if they wanted to, with the intention that it would be finished eventually.
该<regex>
代码从未处于有用状态,而是像当时的许多其他代码一样作为正在进行的工作添加。它被签入并提供给其他人,如果他们愿意,可以进行协作,目的是最终完成。
That's often how open source works: Release early, release often-- unfortunately in the case of <regex>
we only got the early part right and not the often part that would have finished the implementation.
这通常是开源的工作方式:早发布,经常发布——不幸的是,<regex>
我们只得到了早期的部分,而不是完成实施的经常部分。
Most parts of the library were more complete and are now almost fully implemented, but <regex>
hadn't been, so it stayed in the same unfinished state since it was added.
库的大多数部分更加完整,现在几乎完全实现,但<regex>
还没有实现,因此自添加以来它一直处于未完成状态。
Seriously though, who though that shipping an implementation of regex_search that only does "return false" was a good idea?
说真的,谁认为发布一个只执行“返回假”的 regex_search 实现是个好主意?
It wasn't such a bad idea a few years ago, when C++0x was still a work in progress and we shipped lots of partial implementations. No-one thought it would remain unusable for so long so, with hindsight, maybe it should have been disabled and required a macro or built-time option to enable it. But that ship sailed long ago. There are exported symbols from the libstdc++.solibrary that depend on the regex code, so simply removing it (in, say, GCC 4.8) would not have been trivial.
几年前,这并不是一个坏主意,当时 C++0x 仍在开发中,我们发布了许多部分实现。没有人认为它会长时间无法使用,所以事后看来,也许它应该被禁用并需要一个宏或内置时间选项来启用它。但那艘船很久以前就航行了。libstdc++.so库中的导出符号依赖于正则表达式代码,因此简单地将其删除(例如,在 GCC 4.8 中)并非易事。
回答by Matt Clarkson
Feature Detection
特征检测
This is a snippet to detect if the libstdc++
implementation is implemented with C preprocessor defines:
这是一个片段,用于检测libstdc++
实现是否使用 C 预处理器定义:
#include <regex>
#if __cplusplus >= 201103L && \
(!defined(__GLIBCXX__) || (__cplusplus >= 201402L) || \
(defined(_GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT) || \
defined(_GLIBCXX_REGEX_STATE_LIMIT) || \
(defined(_GLIBCXX_RELEASE) && \
_GLIBCXX_RELEASE > 4)))
#define HAVE_WORKING_REGEX 1
#else
#define HAVE_WORKING_REGEX 0
#endif
Macros
宏
_GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT
is definedinbits/regex.tcc
in4.9.x
_GLIBCXX_REGEX_STATE_LIMIT
is definedinbits/regex_automatron.h
in5+
_GLIBCXX_RELEASE
was added to7+
as a result of this answerand is the GCC major version
_GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT
被定义在bits/regex.tcc
在4.9.x
_GLIBCXX_REGEX_STATE_LIMIT
被定义在bits/regex_automatron.h
在5+
_GLIBCXX_RELEASE
7+
由于这个答案而被添加,并且是 GCC 主要版本
Testing
测试
You can test it with GCC like this:
你可以像这样用 GCC 测试它:
cat << EOF | g++ --std=c++11 -x c++ - && ./a.out
#include <regex>
#if __cplusplus >= 201103L && \
(!defined(__GLIBCXX__) || (__cplusplus >= 201402L) || \
(defined(_GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT) || \
defined(_GLIBCXX_REGEX_STATE_LIMIT) || \
(defined(_GLIBCXX_RELEASE) && \
_GLIBCXX_RELEASE > 4)))
#define HAVE_WORKING_REGEX 1
#else
#define HAVE_WORKING_REGEX 0
#endif
#include <iostream>
int main() {
const std::regex regex(".*");
const std::string string = "This should match!";
const auto result = std::regex_search(string, regex);
#if HAVE_WORKING_REGEX
std::cerr << "<regex> works, look: " << std::boolalpha << result << std::endl;
#else
std::cerr << "<regex> doesn't work, look: " << std::boolalpha << result << std::endl;
#endif
return result ? EXIT_SUCCESS : EXIT_FAILURE;
}
EOF
Results
结果
Here are some results for various compilers:
以下是各种编译器的一些结果:
$ gcc --version
gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ./a.out
<regex> doesn't work, look: false
$ gcc --version
gcc (GCC) 6.2.1 20160830
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ./a.out
<regex> works, look: true
$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ./a.out
<regex> works, look: true
$ gcc --version
gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ./a.out
<regex> works, look: true
$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ ./a.out
<regex> works, look: true
$ gcc --version
gcc (GCC) 6.2.1 20160830
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ clang --version
clang version 3.9.0 (tags/RELEASE_390/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ ./a.out # compiled with 'clang -lstdc++'
<regex> works, look: true
Here be Dragons
这里是龙
This is totally unsupported and relies on the detection of private macros that the GCC developers have put into the bits/regex*
headers. They could change and go away at anytime. Hopefully, they won't be removed in the current 4.9.x, 5.x, 6.x releases but they could go away in the 7.x releases.
这是完全不受支持的,并且依赖于对 GCC 开发人员放入bits/regex*
标头中的私有宏的检测。他们可以改变,走在任何时间。希望它们不会在当前的 4.9.x、5.x、6.x 版本中被删除,但它们可能会在 7.x 版本中消失。
If the GCC developers added a #define _GLIBCXX_HAVE_WORKING_REGEX 1
(or something, hint hint nudge nudge) in the 7.x release that persisted, this snippet could be updated to include that and later GCC releases would work with the snippet above.
如果 GCC 开发人员#define _GLIBCXX_HAVE_WORKING_REGEX 1
在持续存在的 7.x 版本中添加了一个(或其他东西,提示提示轻推轻推),则可以更新此代码段以包含该代码段,并且以后的 GCC 版本将与上述代码段一起使用。
As far as I know, all other compilers have a working <regex>
when __cplusplus >= 201103L
but YMMV.
据我所知,所有其他的编译器有一个工作<regex>
时__cplusplus >= 201103L
,但情况因人而异。
Obviously this would completely break if someone defined the _GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT
or _GLIBCXX_REGEX_STATE_LIMIT
macros outside of the stdc++-v3
headers.
显然,如果有人在标题之外定义了_GLIBCXX_REGEX_DFS_QUANTIFIERS_LIMIT
或_GLIBCXX_REGEX_STATE_LIMIT
宏,这将完全中断stdc++-v3
。
回答by Luis Orantes
At this moment (using std=c++14 in g++ (GCC) 4.9.2) is still not accepting regex_match.
此时(在 g++ (GCC) 4.9.2 中使用 std=c++14)仍然不接受 regex_match。
Here is an approach that works like regex_match but using sregex_token_iterator instead. And it works with g++.
这是一种类似于 regex_match 但使用 sregex_token_iterator 的方法。它适用于 g++。
string line="1a2b3c";
std::regex re("(\d)");
std::vector<std::string> inVector{
std::sregex_token_iterator(line.begin(), line.end(), re, 1), {}
};
//prints all matches
for(int i=0; i<inVector.size(); ++i)
std::cout << i << ":" << inVector[i] << endl;
it will print 1 2 3
它将打印 1 2 3
you may read the sregex_token_iterator reference in: http://en.cppreference.com/w/cpp/regex/regex_token_iterator
您可以在以下位置阅读 sregex_token_iterator 参考:http://en.cppreference.com/w/cpp/regex/regex_token_iterator