C# 正则表达式中的重叠匹配
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/320448/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Overlapping matches in Regex
提问by jevakallio
I can't seem to find an answer to this problem, and I'm wondering if one exists. Simplified example:
我似乎无法找到这个问题的答案,我想知道是否存在。简化示例:
Consider a string "nnnn", where I want to find all matches of "nn" - but also those that overlap with each other. So the regex would provide the following 3 matches:
考虑一个字符串“nnnn”,我想在其中找到“nn”的所有匹配项 - 以及那些相互重叠的匹配项。因此正则表达式将提供以下 3 个匹配项:
- nnnn
- nnnn
- nnnn
- NNNN
- ñ NNñ
- NNNN
I realize this is not exactly what regexes are meant for, but walking the string and parsing this manually seems like an awful lot of code, considering that in reality the matches would have to be done using a pattern, not a literal string.
我意识到这并不是正则表达式的真正含义,但是遍历字符串并手动解析它似乎是一个非常多的代码,考虑到实际上必须使用模式而不是文字字符串来完成匹配。
采纳答案by VonC
A possible solution could be to use a positive look behind:
一个可能的解决方案可能是使用积极的背后:
(?<=n)n
It would give you the end position of:
它将为您提供以下最终位置:
- *n***n**nn
- n*n***n**n
- nn*n***n**
- **n***n**nn
- n*n***n**n
- nn*n***n**
As mentionned by Timothy Khouri, a positive lookaheadis more intuitive
正如Timothy Khouri所提到的 ,积极的前瞻更直观
I would prefer to his proposition (?=nn)n
the simpler form:
我更喜欢他的命题(?=nn)n
更简单的形式:
(n)(?=(n))
That would reference the first positionof the strings you want and would capture the second n in group(2).
这将引用您想要的字符串的第一个位置,并将捕获 group(2) 中的第二个 n。
That is so because:
之所以如此,是因为:
- Any valid regular expression can be used inside the lookahead.
- If it contains capturing parentheses, the backreferences will be saved.
- 任何有效的正则表达式都可以在前瞻中使用。
- 如果它包含捕获括号,则将保存反向引用。
So group(1) and group(2) will capture whatever 'n' represents (even if it is a complicated regex).
因此 group(1) 和 group(2) 将捕获“n”代表的任何内容(即使它是一个复杂的正则表达式)。
回答by PhiLho
AFAIK, there is no pure regex way to do that at once (ie. returning the three captures you request without loop).
AFAIK,没有纯粹的正则表达式方法可以立即执行此操作(即,在不循环的情况下返回您请求的三个捕获)。
Now, you can find a pattern once, and loop on the search starting with offset (found position + 1). Should combine regex use with simple code.
现在,您可以找到一个模式,然后从偏移量(找到的位置 + 1)开始循环搜索。应该将正则表达式的使用与简单的代码结合起来。
[EDIT] Great, I am downvoted when I basically said what Jan shown...
[EDIT 2] To be clear: Jan's answer is better. Not more precise, but certainly more detailed, it deserves to be chosen. I just don't understand why mine is downvoted, since I still see nothing incorrect in it. Not a big deal, just annoying.
[编辑] 太好了,当我基本上说出 Jan 所展示的内容时,我被否决了...
[编辑 2] 明确地说:Jan 的答案更好。不是更精确,但肯定更详细,值得选择。我只是不明白为什么我的被否决了,因为我仍然看不到任何不正确的地方。没什么大不了的,只是烦人。
回答by Jan Goyvaerts
Using a lookahead with a capturing group works, at the expense of making your regex slower and more complicated. An alternative solution is to tell the Regex.Match() method where the next match attempt should begin. Try this:
使用捕获组的前瞻工作,但代价是使您的正则表达式更慢和更复杂。另一种解决方案是告诉 Regex.Match() 方法下一次匹配尝试应该从哪里开始。尝试这个:
Regex regexObj = new Regex("nn");
Match matchObj = regexObj.Match(subjectString);
while (matchObj.Success) {
matchObj = regexObj.Match(subjectString, matchObj.Index + 1);
}