Javascript RegEx 使用 RegExp.exec 从字符串中提取所有匹配项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6323417/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 21:15:35  来源:igfitidea点击:

RegEx to extract all matches from string using RegExp.exec

javascriptregexregex-grouptaskwarrior

提问by gatlin

I'm trying to parse the following kind of string:

我正在尝试解析以下类型的字符串:

[key:"val" key2:"val2"]

where there are arbitrary key:"val" pairs inside. I want to grab the key name and the value. For those curious I'm trying to parse the database format of task warrior.

里面有任意键:“val”对。我想获取键名和值。对于那些好奇的人,我正在尝试解析任务战士的数据库格式。

Here is my test string:

这是我的测试字符串:

[description:"aoeu" uuid:"123sth"]

which is meant to highlight that anything can be in a key or value aside from space, no spaces around the colons, and values are always in double quotes.

这是为了强调除了空格之外的任何东西都可以在键或值中,冒号周围没有空格,并且值总是在双引号中。

In node, this is my output:

在节点中,这是我的输出:

[deuteronomy][gatlin][~]$ node
> var re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
> re.exec('[description:"aoeu" uuid:"123sth"]');
[ '[description:"aoeu" uuid:"123sth"]',
  'uuid',
  '123sth',
  index: 0,
  input: '[description:"aoeu" uuid:"123sth"]' ]

But description:"aoeu"also matches this pattern. How can I get all matches back?

而且description:"aoeu"也符合这个模式。我怎样才能取回所有比赛?

回答by lawnsea

Continue calling re.exec(s)in a loop to obtain all the matches:

继续re.exec(s)循环调用以获取所有匹配项:

var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
var m;

do {
    m = re.exec(s);
    if (m) {
        console.log(m[1], m[2]);
    }
} while (m);

Try it with this JSFiddle: https://jsfiddle.net/7yS2V/

试试这个 JSFiddle:https://jsfiddle.net/7yS2V/

回答by Anis

str.match(pattern), if patternhas the global flag g, will return all the matches as an array.

str.match(pattern), 如果pattern有 global flag g,则将所有匹配项作为数组返回。

For example:

例如:

const str = 'All of us except @Emran, @Raju and @Noman was there';
console.log(
  str.match(/@\w*/g)
);
// Will log ["@Emran", "@Raju", "@Noman"]

回答by Christophe

To loop through all matches, you can use the replacefunction:

要遍历所有匹配项,您可以使用以下replace函数:

var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';

s.replace(re, function(match, g1, g2) { console.log(g1, g2); });

回答by lovasoa

This is a solution

这是一个解决方案

var s = '[description:"aoeu" uuid:"123sth"]';

var re = /\s*([^[:]+):\"([^"]+)"/g;
var m;
while (m = re.exec(s)) {
  console.log(m[1], m[2]);
}

This is based on lawnsea's answer, but shorter.

这是基于 lawnsea 的答案,但更短。

Notice that the `g' flag must be set to move the internal pointer forward across invocations.

请注意,必须设置 `g' 标志以在调用之间向前移动内部指针。

回答by noego

str.match(/regex/g)

returns all matches as an array.

将所有匹配项作为数组返回。

If, for some mysterious reason, you need the additional information comes with exec, as an alternative to previous answers, you could do it with a recursive function instead of a loop as follows (which also looks cooler).

如果出于某种神秘的原因,您需要附带的附加信息exec,作为先前答案的替代,您可以使用递归函数而不是循环来完成,如下所示(这看起来也更酷)。

function findMatches(regex, str, matches = []) {
   const res = regex.exec(str)
   res && matches.push(res) && findMatches(regex, str, matches)
   return matches
}

// Usage
const matches = findMatches(/regex/g, str)

as stated in the comments before, it's important to have gat the end of regex definition to move the pointer forward in each execution.

正如之前的评论中所述,g在正则表达式定义的末尾在每次执行中将指针向前移动很重要。

回答by woojoo666

We are finally beginning to see a built-in matchAllfunction, see here for the description and compatibility table. It looks like as of May 2020, Chrome, Edge, Firefox, and Node.js (12+) are supported but not IE, Safari, and Opera. Seems like it was drafted in December 2018so give it some time to reach all browsers, but I trust it will get there.

我们终于开始看到一个内置matchAll函数,请看这里的说明和兼容性表。看起来截至 2020 年 5 月,支持 Chrome、Edge、Firefox 和 Node.js(12+),但不支持 IE、Safari 和 Opera。似乎它是在 2018 年 12 月起草的,所以给它一些时间来覆盖所有浏览器,但我相信它会到达那里。

The built-in matchAllfunction is nice because it returns an iterable. It also returns capturing groups for every match! So you can do things like

内置matchAll函数很好,因为它返回一个可迭代的。它还返回每场比赛的捕获组!所以你可以做这样的事情

// get the letters before and after "o"
let matches = "stackoverflow".matchAll(/(\w)o(\w)/g);

for (match of matches) {
    console.log("letter before:" + match[1]);
    console.log("letter after:" + match[2]);
}

arrayOfAllMatches = [...matches]; // you can also turn the iterable into an array

It also seem like every match object uses the same format as match(). So each object is an array of the match and capturing groups, along with three additional properties index, input, and groups. So it looks like:

似乎每个匹配对象都使用与match(). 因此,每个对象是匹配和捕获组的阵列,用另外的三个属性沿indexinputgroups。所以它看起来像:

[<match>, <group1>, <group2>, ..., index: <match offset>, input: <original string>, groups: <named capture groups>]

For more information about matchAllthere is also a Google developers page. There are also polyfills/shimsavailable.

有关更多信息,matchAll还有一个Google 开发人员页面。还有可用的polyfills/shims

回答by bob

Based on Agus's function, but I prefer return just the match values:

基于 Agus 的函数,但我更喜欢只返回匹配值:

var bob = "&gt; bob &lt;";
function matchAll(str, regex) {
    var res = [];
    var m;
    if (regex.global) {
        while (m = regex.exec(str)) {
            res.push(m[1]);
        }
    } else {
        if (m = regex.exec(str)) {
            res.push(m[1]);
        }
    }
    return res;
}
var Amatch = matchAll(bob, /(&.*?;)/g);
console.log(Amatch);  // yeilds: [&gt;, &lt;]

回答by sdgfsdh

Iterables are nicer:

可迭代对象更好:

const matches = (text, pattern) => ({
  [Symbol.iterator]: function * () {
    const clone = new RegExp(pattern.source, pattern.flags);
    let match = null;
    do {
      match = clone.exec(text);
      if (match) {
        yield match;
      }
    } while (match);
  }
});

Usage in a loop:

循环使用:

for (const match of matches('abcdefabcdef', /ab/g)) {
  console.log(match);
}

Or if you want an array:

或者如果你想要一个数组:

[ ...matches('abcdefabcdef', /ab/g) ]

回答by Jeff Hykin

If you have ES9

如果你有 ES9

(Meaning if your system: Chrome, Node.js, Firefox, etc supports Ecmascript 2019 or later)

Use the new yourString.matchAll( /your-regex/ ).

(意味着如果您的系统:Chrome、Node.js、Firefox 等支持 Ecmascript 2019 或更高版本)

使用新的yourString.matchAll( /your-regex/ ).

If you don't have ES9

如果你没有 ES9

If you have an older system, here's a function for easy copy and pasting

如果您的系统较旧,这里有一个易于复制和粘贴的功能

function findAll(regexPattern, sourceString) {
    let output = []
    let match
    // make sure the pattern has the global flag
    let regexPatternWithGlobal = RegExp(regexPattern,"g")
    while (match = regexPatternWithGlobal.exec(sourceString)) {
        // get rid of the string copy
        delete match.input
        // store the match data
        output.push(match)
    } 
    return output
}

example usage:

用法示例:

console.log(   findAll(/blah/g,'blah1 blah2')   ) 

outputs:

输出:

[ [ 'blah', index: 0 ], [ 'blah', index: 6 ] ]

回答by Agus Syahputra

Here is my function to get the matches :

这是我获取匹配项的功能:

function getAllMatches(regex, text) {
    if (regex.constructor !== RegExp) {
        throw new Error('not RegExp');
    }

    var res = [];
    var match = null;

    if (regex.global) {
        while (match = regex.exec(text)) {
            res.push(match);
        }
    }
    else {
        if (match = regex.exec(text)) {
            res.push(match);
        }
    }

    return res;
}

// Example:

var regex = /abc|def|ghi/g;
var res = getAllMatches(regex, 'abcdefghi');

res.forEach(function (item) {
    console.log(item[0]);
});