在 JavaScript 正则表达式中命名捕获组?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5367369/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Named capturing groups in JavaScript regex?
提问by mmierins
As far as I know there is no such thing as named capturing groups in JavaScript. What is the alternative way to get similar functionality?
据我所知,JavaScript 中没有命名捕获组这样的东西。获得类似功能的替代方法是什么?
采纳答案by Tim Pietzcker
ECMAScript 2018 introduces named capturing groupsinto JavaScript regexes.
ECMAScript 2018 将命名捕获组引入 JavaScript 正则表达式。
Example:
例子:
const auth = 'Bearer AUTHORIZATION_TOKEN'
const { groups: { token } } = /Bearer (?<token>[^ $]*)/.exec(auth)
console.log(token) // "Prints AUTHORIZATION_TOKEN"
If you need to support older browsers, you can do everything with normal (numbered) capturing groups that you can do with named capturing groups, you just need to keep track of the numbers - which may be cumbersome if the order of capturing group in your regex changes.
如果您需要支持较旧的浏览器,您可以使用普通(编号)捕获组执行命名捕获组可以执行的所有操作,您只需要跟踪数字 - 如果捕获组的顺序在您的正则表达式更改。
There are only two "structural" advantages of named capturing groups I can think of:
我能想到的命名捕获组只有两个“结构”优势:
In some regex flavors (.NET and JGSoft, as far as I know), you can use the same name for different groups in your regex (see here for an example where this matters). But most regex flavors do not support this functionality anyway.
If you need to refer to numbered capturing groups in a situation where they are surrounded by digits, you can get a problem. Let's say you want to add a zero to a digit and therefore want to replace
(\d)
with$10
. In JavaScript, this will work (as long as you have fewer than 10 capturing group in your regex), but Perl will think you're looking for backreference number10
instead of number1
, followed by a0
. In Perl, you can use${1}0
in this case.
在某些正则表达式风格(.NET 和 JGSoft,据我所知)中,您可以对正则表达式中的不同组使用相同的名称(有关此问题的示例,请参见此处)。但无论如何,大多数正则表达式都不支持此功能。
如果您需要在被数字包围的情况下引用编号的捕获组,您可能会遇到问题。假设您想在数字上添加一个零,因此想要替换
(\d)
为$10
。在 JavaScript 中,这会起作用(只要您的正则表达式中的捕获组少于 10 个),但 Perl 会认为您正在寻找反向引用 number10
而不是 number1
,后跟0
. 在 Perl 中,您可以${1}0
在这种情况下使用。
Other than that, named capturing groups are just "syntactic sugar". It helps to use capturing groups only when you really need them and to use non-capturing groups (?:...)
in all other circumstances.
除此之外,命名的捕获组只是“语法糖”。只有在您真正需要它们时才使用捕获组,并(?:...)
在所有其他情况下使用非捕获组,这会有所帮助。
The bigger problem (in my opinion) with JavaScript is that it does not support verbose regexes which would make the creation of readable, complex regular expressions a lot easier.
JavaScript 的更大问题(在我看来)是它不支持冗长的正则表达式,这会使创建可读的、复杂的正则表达式变得容易得多。
Steve Levithan's XRegExp librarysolves these problems.
Steve Levithan 的 XRegExp 库解决了这些问题。
回答by Yunga Palatino
You can use XRegExp, an augmented, extensible, cross-browser implementation of regular expressions, including support for additional syntax, flags, and methods:
您可以使用XRegExp,这是一种增强的、可扩展的、跨浏览器的正则表达式实现,包括对附加语法、标志和方法的支持:
- Adds new regex and replacement text syntax, including comprehensive support for named capture.
- Adds two new regex flags:
s
, to make dot match all characters (aka dotall or singleline mode), andx
, for free-spacing and comments (aka extended mode). - Provides a suite of functions and methods that make complex regex processing a breeze.
- Automagically fixes the most commonly encountered cross-browser inconsistencies in regex behavior and syntax.
- Lets you easily create and use plugins that add new syntax and flags to XRegExp's regular expression language.
- 添加新的正则表达式和替换文本语法,包括对命名捕获的全面支持。
- 添加两个新的正则表达式标志:
s
, 使点匹配所有字符(又名 dotall 或单行模式),和x
, 用于自由间距和注释(又名扩展模式)。 - 提供一套函数和方法,使复杂的正则表达式处理变得轻而易举。
- 自动修复正则表达式行为和语法中最常见的跨浏览器不一致问题。
- 让您轻松创建和使用插件,为 XRegExp 的正则表达式语言添加新的语法和标志。
回答by Mr. TA
Another possible solution: create an object containing the group names and indexes.
另一种可能的解决方案:创建一个包含组名和索引的对象。
var regex = new RegExp("(.*) (.*)");
var regexGroups = { FirstName: 1, LastName: 2 };
Then, use the object keys to reference the groups:
然后,使用对象键来引用组:
var m = regex.exec("John Smith");
var f = m[regexGroups.FirstName];
This improves the readability/quality of the code using the results of the regex, but not the readability of the regex itself.
这使用正则表达式的结果提高了代码的可读性/质量,但不是正则表达式本身的可读性。
回答by fregante
In ES6 you can use array destructuring to catch your groups:
在 ES6 中,您可以使用数组解构来捕获您的组:
let text = '27 months';
let regex = /(\d+)\s*(days?|months?|years?)/;
let [, count, unit] = regex.exec(text) || [];
// count === '27'
// unit === 'months'
Notice:
注意:
- the first comma in the last
let
skips the first value of the resulting array, which is the whole matched string - the
|| []
after.exec()
will prevent a destructuring error when there are no matches (because.exec()
will returnnull
)
- 最后一个逗号
let
跳过结果数组的第一个值,它是整个匹配的字符串 - 当没有匹配项时,
|| []
after.exec()
将防止解构错误(因为.exec()
将返回null
)
回答by Forivin
Update: It finally made it into JavaScript (ECMAScript 2018)!
更新:它终于变成了 JavaScript (ECMAScript 2018)!
Named capturing groups could make it into JavaScript very soon.
The proposal for it is at stage 3 already.
命名的捕获组很快就会进入 JavaScript。
它的提案已经处于第 3 阶段。
A capture group can be given a name inside angular brackets using the (?<name>...)
syntax, for
any identifier name. The regular expression for a date then can be
written as /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u
. Each name
should be unique and follow the grammar for ECMAScript IdentifierName.
(?<name>...)
对于任何标识符名称,可以使用语法在尖括号内为捕获组指定一个名称。日期的正则表达式可以写成/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u
. 每个名称都应该是唯一的,并遵循 ECMAScript IdentifierName的语法。
Named groups can be accessed from properties of a groups property of the regular expression result. Numbered references to the groups are also created, just as for non-named groups. For example:
命名组可以从正则表达式结果的组属性的属性中访问。与未命名的组一样,还会创建对组的编号引用。例如:
let re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/u;
let result = re.exec('2015-01-02');
// result.groups.year === '2015';
// result.groups.month === '01';
// result.groups.day === '02';
// result[0] === '2015-01-02';
// result[1] === '2015';
// result[2] === '01';
// result[3] === '02';
回答by Yashima
Naming captured groups provide one thing: less confusion with complex regular expressions.
命名捕获的组提供了一件事:减少与复杂正则表达式的混淆。
It really depends on your use-case but maybe pretty-printing your regex could help.
这真的取决于您的用例,但也许漂亮地打印您的正则表达式可能会有所帮助。
Or you could try and define constants to refer to your captured groups.
或者您可以尝试定义常量来引用您捕获的组。
Comments might then also help to show others who read your code, what you have done.
评论也可能有助于向阅读您代码的其他人展示您做了什么。
For the rest I must agree with Tims answer.
其余的我必须同意蒂姆斯的回答。
回答by chiborg
There is a node.js library called named-regexpthat you could use in your node.js projects (on in the browser by packaging the library with browserify or other packaging scripts). However, the library cannot be used with regular expressions that contain non-named capturing groups.
有一个名为named-regexp的 node.js 库,您可以在您的 node.js 项目中使用它(在浏览器中通过使用 browserify 或其他打包脚本打包库)。但是,该库不能与包含未命名捕获组的正则表达式一起使用。
If you count the opening capturing braces in your regular expression you can create a mapping between named capturing groups and the numbered capturing groups in your regex and can mix and match freely. You just have to remove the group names before using the regex. I've written three functions that demonstrate that. See this gist: https://gist.github.com/gbirke/2cc2370135b665eee3ef
如果您计算正则表达式中的左捕获括号,您可以在正则表达式中的命名捕获组和编号捕获组之间创建映射,并且可以自由混合和匹配。您只需要在使用正则表达式之前删除组名。我已经编写了三个函数来证明这一点。请参阅此要点:https: //gist.github.com/gbirke/2cc2370135b665eee3ef
回答by Hamed Mahdizadeh
As Tim Pietzckersaid ECMAScript 2018 introduces named capturing groups into JavaScript regexes. But what I did not find in the above answers was how to use the named captured groupin the regex itself.
正如Tim Pietzcker所说,ECMAScript 2018 将命名捕获组引入 JavaScript 正则表达式。但是我在上面的答案中没有找到的是如何在正则表达式本身中使用命名的捕获组。
you can use named captured group with this syntax: \k<name>
.
for example
您可以使用命名捕获组语法:\k<name>
。例如
var regexObj = /(?<year>\d{4})-(?<day>\d{2})-(?<month>\d{2}) year is \k<year>/
and as Forivinsaid you can use captured group in object result as follow:
正如Forivin所说,您可以在对象结果中使用捕获的组,如下所示:
let result = regexObj.exec('2019-28-06 year is 2019');
// result.groups.year === '2019';
// result.groups.month === '06';
// result.groups.day === '28';
var regexObj = /(?<year>\d{4})-(?<day>\d{2})-(?<month>\d{2}) year is \k<year>/mgi;
function check(){
var inp = document.getElementById("tinput").value;
let result = regexObj.exec(inp);
document.getElementById("year").innerHTML = result.groups.year;
document.getElementById("month").innerHTML = result.groups.month;
document.getElementById("day").innerHTML = result.groups.day;
}
td, th{
border: solid 2px #ccc;
}
<input id="tinput" type="text" value="2019-28-06 year is 2019"/>
<br/>
<br/>
<span>Pattern: "(?<year>\d{4})-(?<day>\d{2})-(?<month>\d{2}) year is \k<year>";
<br/>
<br/>
<button onclick="check()">Check!</button>
<br/>
<br/>
<table>
<thead>
<tr>
<th>
<span>Year</span>
</th>
<th>
<span>Month</span>
</th>
<th>
<span>Day</span>
</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<span id="year"></span>
</td>
<td>
<span id="month"></span>
</td>
<td>
<span id="day"></span>
</td>
</tr>
</tbody>
</table>
回答by Matías Fidemraizer
While you can't do this with vanilla JavaScript, maybe you can use some Array.prototype
function like Array.prototype.reduce
to turn indexed matches into named ones using some magic.
虽然你不能用普通的 JavaScript 做到这一点,但也许你可以使用一些Array.prototype
函数,比如Array.prototype.reduce
使用一些魔法将索引匹配转换为命名匹配。
Obviously, the following solution will need that matches occur in order:
显然,以下解决方案需要按顺序进行匹配:
// @text Contains the text to match
// @regex A regular expression object (f.e. /.+/)
// @matchNames An array of literal strings where each item
// is the name of each group
function namedRegexMatch(text, regex, matchNames) {
var matches = regex.exec(text);
return matches.reduce(function(result, match, index) {
if (index > 0)
// This substraction is required because we count
// match indexes from 1, because 0 is the entire matched string
result[matchNames[index - 1]] = match;
return result;
}, {});
}
var myString = "Hello Alex, I am John";
var namedMatches = namedRegexMatch(
myString,
/Hello ([a-z]+), I am ([a-z]+)/i,
["firstPersonName", "secondPersonName"]
);
alert(JSON.stringify(namedMatches));
回答by toddmo
Don't have ECMAScript 2018?
没有 ECMAScript 2018?
My goal was to make it work as similar as possible to what we are used to with named groups. Whereas in ECMAScript 2018 you can place ?<groupname>
inside the group to indicate a named group, in my solution for older javascript, you can place (?!=<groupname>)
inside the group to do the same thing. So it's an extra set of parenthesis and an extra !=
. Pretty close!
我的目标是让它的工作方式尽可能类似于我们习惯于命名组的方式。而在 ECMAScript 2018 中,您可以放置?<groupname>
在组内以指示命名组,而在我的旧版 javascript 解决方案中,您可以放置(?!=<groupname>)
在组内以执行相同的操作。所以它是一组额外的括号和一个额外的!=
. 很接近了!
I wrapped all of it into a string prototype function
我把它全部包装成一个字符串原型函数
Features
特征
- works with older javascript
- no extra code
- pretty simple to use
- Regex still works
- groups are documented within the regex itself
- group names can have spaces
- returns object with results
- 适用于较旧的 javascript
- 没有额外的代码
- 使用起来非常简单
- 正则表达式仍然有效
- 组记录在正则表达式本身中
- 组名可以有空格
- 返回带有结果的对象
Instructions
指示
- place
(?!={groupname})
inside each group you want to name - remember to eliminate any non-capturing groups
()
by putting?:
at the beginning of that group. These won't be named.
- 放置
(?!={groupname})
在您要命名的每个组中 - 请记住
()
通过放置?:
在该组的开头来消除任何非捕获组。这些不会被命名。
arrays.js
数组.js
// @@pattern - includes injections of (?!={groupname}) for each group
// @@returns - an object with a property for each group having the group's match as the value
String.prototype.matchWithGroups = function (pattern) {
var matches = this.match(pattern);
return pattern
// get the pattern as a string
.toString()
// suss out the groups
.match(/<(.+?)>/g)
// remove the braces
.map(function(group) {
return group.match(/<(.+)>/)[1];
})
// create an object with a property for each group having the group's match as the value
.reduce(function(acc, curr, index, arr) {
acc[curr] = matches[index + 1];
return acc;
}, {});
};
usage
用法
function testRegGroups() {
var s = '123 Main St';
var pattern = /((?!=<house number>)\d+)\s((?!=<street name>)\w+)\s((?!=<street type>)\w+)/;
var o = s.matchWithGroups(pattern); // {'house number':"123", 'street name':"Main", 'street type':"St"}
var j = JSON.stringify(o);
var housenum = o['house number']; // 123
}
result of o
o 的结果
{
"house number": "123",
"street name": "Main",
"street type": "St"
}