Javascript RegEx 使用 RegExp.exec 从字符串中提取所有匹配项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6323417/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
RegEx to extract all matches from string using RegExp.exec
提问by gatlin
I'm trying to parse the following kind of string:
我正在尝试解析以下类型的字符串:
[key:"val" key2:"val2"]
where there are arbitrary key:"val" pairs inside. I want to grab the key name and the value. For those curious I'm trying to parse the database format of task warrior.
里面有任意键:“val”对。我想获取键名和值。对于那些好奇的人,我正在尝试解析任务战士的数据库格式。
Here is my test string:
这是我的测试字符串:
[description:"aoeu" uuid:"123sth"]
which is meant to highlight that anything can be in a key or value aside from space, no spaces around the colons, and values are always in double quotes.
这是为了强调除了空格之外的任何东西都可以在键或值中,冒号周围没有空格,并且值总是在双引号中。
In node, this is my output:
在节点中,这是我的输出:
[deuteronomy][gatlin][~]$ node
> var re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
> re.exec('[description:"aoeu" uuid:"123sth"]');
[ '[description:"aoeu" uuid:"123sth"]',
'uuid',
'123sth',
index: 0,
input: '[description:"aoeu" uuid:"123sth"]' ]
But description:"aoeu"
also matches this pattern. How can I get all matches back?
而且description:"aoeu"
也符合这个模式。我怎样才能取回所有比赛?
回答by lawnsea
Continue calling re.exec(s)
in a loop to obtain all the matches:
继续re.exec(s)
循环调用以获取所有匹配项:
var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
var m;
do {
m = re.exec(s);
if (m) {
console.log(m[1], m[2]);
}
} while (m);
Try it with this JSFiddle: https://jsfiddle.net/7yS2V/
试试这个 JSFiddle:https://jsfiddle.net/7yS2V/
回答by Anis
str.match(pattern)
, if pattern
has the global flag g
, will return all the matches as an array.
str.match(pattern)
, 如果pattern
有 global flag g
,则将所有匹配项作为数组返回。
For example:
例如:
const str = 'All of us except @Emran, @Raju and @Noman was there';
console.log(
str.match(/@\w*/g)
);
// Will log ["@Emran", "@Raju", "@Noman"]
回答by Christophe
To loop through all matches, you can use the replace
function:
要遍历所有匹配项,您可以使用以下replace
函数:
var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
s.replace(re, function(match, g1, g2) { console.log(g1, g2); });
回答by lovasoa
This is a solution
这是一个解决方案
var s = '[description:"aoeu" uuid:"123sth"]';
var re = /\s*([^[:]+):\"([^"]+)"/g;
var m;
while (m = re.exec(s)) {
console.log(m[1], m[2]);
}
This is based on lawnsea's answer, but shorter.
这是基于 lawnsea 的答案,但更短。
Notice that the `g' flag must be set to move the internal pointer forward across invocations.
请注意,必须设置 `g' 标志以在调用之间向前移动内部指针。
回答by noego
str.match(/regex/g)
returns all matches as an array.
将所有匹配项作为数组返回。
If, for some mysterious reason, you need the additional information comes with exec
, as an alternative to previous answers, you could do it with a recursive function instead of a loop as follows (which also looks cooler).
如果出于某种神秘的原因,您需要附带的附加信息exec
,作为先前答案的替代,您可以使用递归函数而不是循环来完成,如下所示(这看起来也更酷)。
function findMatches(regex, str, matches = []) {
const res = regex.exec(str)
res && matches.push(res) && findMatches(regex, str, matches)
return matches
}
// Usage
const matches = findMatches(/regex/g, str)
as stated in the comments before, it's important to have g
at the end of regex definition to move the pointer forward in each execution.
正如之前的评论中所述,g
在正则表达式定义的末尾在每次执行中将指针向前移动很重要。
回答by woojoo666
We are finally beginning to see a built-in matchAll
function, see here for the description and compatibility table. It looks like as of May 2020, Chrome, Edge, Firefox, and Node.js (12+) are supported but not IE, Safari, and Opera. Seems like it was drafted in December 2018so give it some time to reach all browsers, but I trust it will get there.
我们终于开始看到一个内置matchAll
函数,请看这里的说明和兼容性表。看起来截至 2020 年 5 月,支持 Chrome、Edge、Firefox 和 Node.js(12+),但不支持 IE、Safari 和 Opera。似乎它是在 2018 年 12 月起草的,所以给它一些时间来覆盖所有浏览器,但我相信它会到达那里。
The built-in matchAll
function is nice because it returns an iterable. It also returns capturing groups for every match! So you can do things like
内置matchAll
函数很好,因为它返回一个可迭代的。它还返回每场比赛的捕获组!所以你可以做这样的事情
// get the letters before and after "o"
let matches = "stackoverflow".matchAll(/(\w)o(\w)/g);
for (match of matches) {
console.log("letter before:" + match[1]);
console.log("letter after:" + match[2]);
}
arrayOfAllMatches = [...matches]; // you can also turn the iterable into an array
It also seem like every match object uses the same format as match()
. So each object is an array of the match and capturing groups, along with three additional properties index
, input
, and groups
. So it looks like:
似乎每个匹配对象都使用与match()
. 因此,每个对象是匹配和捕获组的阵列,用另外的三个属性沿index
,input
和groups
。所以它看起来像:
[<match>, <group1>, <group2>, ..., index: <match offset>, input: <original string>, groups: <named capture groups>]
For more information about matchAll
there is also a Google developers page. There are also polyfills/shimsavailable.
有关更多信息,matchAll
还有一个Google 开发人员页面。还有可用的polyfills/shims。
回答by bob
Based on Agus's function, but I prefer return just the match values:
基于 Agus 的函数,但我更喜欢只返回匹配值:
var bob = "> bob <";
function matchAll(str, regex) {
var res = [];
var m;
if (regex.global) {
while (m = regex.exec(str)) {
res.push(m[1]);
}
} else {
if (m = regex.exec(str)) {
res.push(m[1]);
}
}
return res;
}
var Amatch = matchAll(bob, /(&.*?;)/g);
console.log(Amatch); // yeilds: [>, <]
回答by sdgfsdh
Iterables are nicer:
可迭代对象更好:
const matches = (text, pattern) => ({
[Symbol.iterator]: function * () {
const clone = new RegExp(pattern.source, pattern.flags);
let match = null;
do {
match = clone.exec(text);
if (match) {
yield match;
}
} while (match);
}
});
Usage in a loop:
循环使用:
for (const match of matches('abcdefabcdef', /ab/g)) {
console.log(match);
}
Or if you want an array:
或者如果你想要一个数组:
[ ...matches('abcdefabcdef', /ab/g) ]
回答by Jeff Hykin
If you have ES9
如果你有 ES9
(Meaning if your system: Chrome, Node.js, Firefox, etc supports Ecmascript 2019 or later)
Use the new yourString.matchAll( /your-regex/ )
.
(意味着如果您的系统:Chrome、Node.js、Firefox 等支持 Ecmascript 2019 或更高版本)
使用新的yourString.matchAll( /your-regex/ )
.
If you don't have ES9
如果你没有 ES9
If you have an older system, here's a function for easy copy and pasting
如果您的系统较旧,这里有一个易于复制和粘贴的功能
function findAll(regexPattern, sourceString) {
let output = []
let match
// make sure the pattern has the global flag
let regexPatternWithGlobal = RegExp(regexPattern,"g")
while (match = regexPatternWithGlobal.exec(sourceString)) {
// get rid of the string copy
delete match.input
// store the match data
output.push(match)
}
return output
}
example usage:
用法示例:
console.log( findAll(/blah/g,'blah1 blah2') )
outputs:
输出:
[ [ 'blah', index: 0 ], [ 'blah', index: 6 ] ]
回答by Agus Syahputra
Here is my function to get the matches :
这是我获取匹配项的功能:
function getAllMatches(regex, text) {
if (regex.constructor !== RegExp) {
throw new Error('not RegExp');
}
var res = [];
var match = null;
if (regex.global) {
while (match = regex.exec(text)) {
res.push(match);
}
}
else {
if (match = regex.exec(text)) {
res.push(match);
}
}
return res;
}
// Example:
var regex = /abc|def|ghi/g;
var res = getAllMatches(regex, 'abcdefghi');
res.forEach(function (item) {
console.log(item[0]);
});