javascript 使用正则表达式匹配获取所有子组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4199545/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Getting all subgroups with a regex match
提问by Caveatrob
Given the string:
鉴于字符串:
? 2010 Women's Flat Track Derby Association (WFTDA)
I want:
我想:
2010 -- Women's -- Flat
Women's -- Flat -- Track
Track -- Derby -- Association
I'm using regex:
我正在使用正则表达式:
([a-zA-Z]+)\s([A-Z][a-z]*)\s([a-zA-Z]+)
It's only returning:
它只是返回:
s -- Flat -- Track
回答by Daniel Vandersluis
This problem isn't straightforward, but to understand why, you need to understand how the regular expression engine operates on your string.
这个问题并不简单,但要了解原因,您需要了解正则表达式引擎如何对您的字符串进行操作。
Let's consider the pattern [a-z]{3}(match 3 successive characters between a and z) on the target string abcdef. The engine starts from the left side of the string (before the a), and sees that amatches [a-z], so it advances one position. Then, it sees that bmatches [a-z]and advances again. Finally, it sees that cmatches, advances again (to before d) and returns abcas a match.
让我们考虑[a-z]{3}目标字符串上的模式(匹配 a 和 z 之间的 3 个连续字符)abcdef。引擎从字符串的左侧(在 之前a)开始,并看到a匹配的[a-z],因此它前进一个位置。然后,它看到b匹配[a-z]并再次前进。最后,它看到c匹配项,再次前进(到 before d)并abc作为匹配项返回。
If the engine is set up to return multiple matches, it will now try to match again, but it keeps its positional information (so, like above, it'll match and return def).
如果引擎设置为返回多个匹配项,它现在将再次尝试匹配,但会保留其位置信息(因此,如上所述,它将匹配并返回def)。
Because the engine has already moved past the bwhile matching abc, bcdwill never be considered as a match. For this same reason, in your expression, once a group of words is matched, the engine will never consider words within the first match to be a part of the next one.
因为引擎已经移过了bwhile 匹配abc,bcd永远不会被视为匹配。出于同样的原因,在您的表达式中,一旦匹配了一组单词,引擎将永远不会将第一个匹配项中的单词视为下一个匹配项的一部分。
In order to get around this, you need to use capturing groups inside of lookaheadsto collect matching words that appear later in the string:
为了解决这个问题,您需要在前瞻中使用捕获组来收集出现在字符串后面的匹配词:
var str = "2010 Women's Flat Track Derby Association",
regex = /([a-z0-9']+)(?=\s+([a-z0-9']+)\s+([a-z0-9']+))/ig;
while (match = regex.exec(str))
{
var group1 = match[1], group2 = match[2], group3 = match[3];
document.write("Found match: " + group1 + " -- " + group2 + " -- " + group3 + "<br />\n");
}
This results in:
这导致:
2010 -- Women's -- Flat
Women's -- Flat -- Track
Flat -- Track -- Derby
Track -- Derby -- Association
See this in action at http://jsfiddle.net/jRgXm/.
在http://jsfiddle.net/jRgXm/ 上查看此操作。
The regular expression searches for what you seem to be defining as a word ([a-z0-9']+), and captures it into subgroup 1, and then uses a lookahead (which is a zero-width assertion, so it doesn't advance the engine's cursor), that captures the next two words into subgroups 2 and 3.
正则表达式搜索您似乎定义为 word 的内容([a-z0-9']+),并将其捕获到子组 1 中,然后使用前瞻(这是一个零宽度断言,因此它不会推进引擎的光标),捕获接下来的两个词分成子组 2 和 3。
However, if you are using the actual Javascript engine, you mustRegExp.execand loop over the results (see this questionfor a discussion of why). I don't know how UltraEdit's engine is implemented, but hopefully it can do a global search and also collect subgroups.
但是,如果您使用的是实际的 Javascript 引擎,则必须RegExp.exec循环遍历结果(有关原因的讨论,请参阅此问题)。我不知道 UltraEdit 的引擎是如何实现的,但希望它可以进行全局搜索并收集子组。
回答by josh.trow
I'm using some generic regex tester, so I can't guarantee it will work for you but...
我正在使用一些通用的正则表达式测试器,所以我不能保证它对你有用,但是......
([A-Z0-9][\w']+)\s([A-Z][\w]+)\s([A-Z][\w]+)
Three words starting with a number or capital letter followed by letters/numbers or that funky apostrophe, separated by spaces. Works for me.
以数字或大写字母开头的三个单词,后跟字母/数字或时髦的撇号,以空格分隔。对我来说有效。
Edit: I assume you can loop through, repeating the matcher in JS i've never used it.
编辑:我假设您可以循环遍历,在我从未使用过的 JS 中重复匹配器。

