Javascript Regex - 查找所有可能的匹配项,即使在已捕获的匹配项中

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14863026/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 18:10:23  来源:igfitidea点击:

Javascript Regex - Find all possible matches, even in already captured matches

javascriptregexstringmatch

提问by Vinnie Cent

I'm trying to obtain all possible matchesfrom a string using regex with javascript. It appears that my method of doing this is not matching parts of the string that have already been matched.

我正在尝试使用带有 javascript 的正则表达式从字符串中获取所有可能的匹配项。看来我这样做的方法与已经匹配的字符串部分不匹配。

Variables:

变量:

var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';

var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;

Code:

代码:

var match = string.match(reg);

All matched results I get:

我得到的所有匹配结果:

A1B1Y:A1B2Y
A1B5Y:A1B6Y
A1B9Y:A1B10Y

Matched results I want:

我想要的匹配结果:

A1B1Y:A1B2Y
A1B2Y:A1B3Y
A1B5Y:A1B6Y
A1B6Y:A1B7Y
A1B9Y:A1B10Y
A1B10Y:A1B11Y

In my head, I want A1B1Y:A1B2Yto be a match along with A1B2Y:A1B3Y, even though A1B2Yin the string will need to be part of two matches.

在我的脑海中,我想A1B1Y:A1B2Y与 匹配A1B2Y:A1B3Y,即使A1B2Y在字符串中需要成为两个匹配的一部分。

回答by Fabrício Matté

Without modifying your regex, you can set it to start matching at the beginning of the second half of the match after each match using .execand manipulating the regex object's lastIndexproperty.

在不修改正则表达式的情况下,您可以将其设置为在每次匹配后使用.exec和操作正则表达式对象的lastIndex属性在匹配的后半部分开始时开始匹配。

var string = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y';
var reg = /A[0-9]+B[0-9]+Y:A[0-9]+B[0-9]+Y/g;
var matches = [], found;
while (found = reg.exec(string)) {
    matches.push(found[0]);
    reg.lastIndex -= found[0].split(':')[1].length;
}

console.log(matches);
//["A1B1Y:A1B2Y", "A1B2Y:A1B3Y", "A1B5Y:A1B6Y", "A1B6Y:A1B7Y", "A1B9Y:A1B10Y", "A1B10Y:A1B11Y"]

Demo

演示



As per Bergi's comment, you can also get the index of the last match and increment it by 1 so it instead of starting to match from the second half of the match onwards, it will start attempting to match from the second character of each match onwards:

根据 Bergi 的评论,您还可以获取最后一场比赛的索引并将其加 1,这样它就不会从比赛的后半部分开始匹配,而是从每个比赛的第二个字符开始尝试匹配:

reg.lastIndex = found.index+1;

Demo

演示

The final outcome is the same. Though, Bergi's update has a little less code and performs slightly faster. =]

最后的结果是一样的。不过,Bergi 的更新代码少了一点,执行速度也快一些=]

回答by nhahtdh

You cannot get the direct result from match, but it is possible to produce the result via RegExp.execand with some modification to the regex:

您无法从 获得直接结果match,但可以通过RegExp.exec对正则表达式进行一些修改来生成结果:

var regex = /A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g;
var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var arr;
var results = [];

while ((arr = regex.exec(input)) !== null) {
    results.push(arr[0] + arr[1]);
}

I used zero-widthpositive look-ahead (?=pattern)in order not to consume the text, so that the overlapping portion can be rematched.

为了不消耗文本,我使用了零宽度(?=pattern)前瞻,以便可以重新匹配重叠部分。

Actually, it is possible to abuse replacemethod to do achieve the same result:

实际上,可以滥用replace方法来达到相同的结果:

var input = 'A1B1Y:A1B2Y:A1B3Y:A1B4Z:A1B5Y:A1B6Y:A1B7Y:A1B8Z:A1B9Y:A1B10Y:A1B11Y'
var results = [];

input.replace(/A[0-9]+B[0-9]+Y(?=(:A[0-9]+B[0-9]+Y))/g, function (
var re = /A\d+B\d+Y(?=:A\d+B\d+Y)/g;
, ) { results.push(
string.match(re); // ["A1B1Y", "A1B2Y", "A1B5Y", "A1B6Y", "A1B9Y", "A1B10Y"]
+ ); return ''; });

However, since it is replace, it does extra useless replacement work.

然而,既然是replace,它做了额外的无用的替换工作。

回答by satchmorun

Unfortunately, it's not quite as simple as a single string.match.

不幸的是,它并不像单个string.match.

The reason is that you want overlapping matches, which the /gflag doesn't give you.

原因是您想要重叠匹配,而/g标志没有给您。

You could use lookahead:

您可以使用前瞻:

// using re from above to get the overlapping matches

var m;
var matches = [];
var re2 = /A\d+B\d+Y:A\d+B\d+Y/g; // make another regex to get what we need

while ((m = re.exec(string)) !== null) {
  // m is a match object, which has the index of the current match
  matches.push(string.substring(m.index).match(re2)[0]);
}

matches == [
  "A1B1Y:A1B2Y", 
  "A1B2Y:A1B3Y", 
  "A1B5Y:A1B6Y", 
  "A1B6Y:A1B7Y", 
  "A1B9Y:A1B10Y", 
  "A1B10Y:A1B11Y"
];

But now you get:

但现在你得到:

##代码##

The reason is that lookahead is zero-width, meaning that it just says whether the pattern comes after what you're trying to match or not; it doesn't include it in the match.

原因是前瞻是零宽度的,这意味着它只是说明模式是否出现在您尝试匹配的内容之后;它不包括在比赛中。

You could use execto try and grab what you want. If a regex has the /gflag, you can run execrepeatedly to get all the matches:

你可以exec用来尝试抓住你想要的东西。如果正则表达式有/g标志,您可以exec重复运行以获取所有匹配项:

##代码##

Here's a fiddle of this in action. Open up the console to see the results

这是一个在行动中的小提琴。打开控制台查看结果

Alternatively, you could split the original string on :, then loop through the resulting array, pulling out the the ones that match when array[i]and array[i+1]both match like you want.

或者,您可以在 上拆分原始字符串:,然后循环遍历结果数组,拉出匹配时间array[i]并且array[i+1]都匹配的字符串。