如何在 JavaScript 中将长正则表达式拆分为多行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12317049/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-23 07:42:49  来源:igfitidea点击:

How to split a long regular expression into multiple lines in JavaScript?

javascriptregexjslintexpressionreadability

提问by Nik Sumeiko

I have a very long regular expression, which I wish to split into multiple lines in my JavaScript code to keep each line length 80 characters according to JSLint rules. It's just better for reading, I think. Here's pattern sample:

我有一个很长的正则表达式,我希望在我的 JavaScript 代码中将其拆分为多行,以根据 JSLint 规则保持每行长度为 80 个字符。我认为,它更适合阅读。这是模式示例:

var pattern = /^(([^<>()[\]\.,;:\s@\"]+(\.[^<>()[\]\.,;:\s@\"]+)*)|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/;

采纳答案by KooiInc

You could convert it to a string and create the expression by calling new RegExp():

您可以将其转换为字符串并通过调用创建表达式new RegExp()

var myRE = new RegExp (['^(([^<>()[\]\.,;:\s@\"]+(\.[^<>(),[\]\.,;:\s@\"]+)*)',
                        '|(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.',
                        '[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+',
                        '[a-zA-Z]{2,}))$'].join(''));

Notes:

笔记:

  1. when converting the expression literalto a string you need to escape all backslashes as backslashes are consumed when evaluating a string literal. (See Kayo's comment for more detail.)
  2. RegExpaccepts modifiers as a second parameter

    /regex/g=> new RegExp('regex', 'g')

  1. 表达式文字转换为字符串时,您需要转义所有反斜杠,因为在评估字符串文字时会消耗反斜杠。(有关更多详细信息,请参阅 Kayo 的评论。)
  2. RegExp接受修饰符作为第二个参数

    /regex/g=> new RegExp('regex', 'g')

[Addition ES20xx(tagged template)]

[添加ES20xx(标记模板)]

In ES20xx you can use tagged templates. See the snippet.

在 ES20xx 中,您可以使用标记模板。请参阅片段。

Note:

笔记:

  • Disadvantage here is that you can't use plain whitespace in the regular expression string (always use \s, \s+, \s{1,x}, \t, \netc).
  • 这里缺点是,你不能在正则表达式字符串使用纯空格(经常使用\s\s+\s{1,x}\t\n等)。

(() => {
  const createRegExp = (str, opts) => 
    new RegExp(str.raw[0].replace(/\s/gm, ""), opts || "");
  const yourRE = createRegExp`
    ^(([^<>()[\]\.,;:\s@\"]+(\.[^<>()[\]\.,;:\s@\"]+)*)|
    (\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|
    (([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$`;
  console.log(yourRE);
  const anotherLongRE = createRegExp`
    (\byyyy\b)|(\bm\b)|(\bd\b)|(\bh\b)|(\bmi\b)|(\bs\b)|(\bms\b)|
    (\bwd\b)|(\bmm\b)|(\bdd\b)|(\bhh\b)|(\bMI\b)|(\bS\b)|(\bMS\b)|
    (\bM\b)|(\bMM\b)|(\bdow\b)|(\bDOW\b)
    ${"gi"}`;
  console.log(anotherLongRE);
})();

回答by korun

Extending @KooiInc answer, you can avoid manually escaping every special character by using the sourceproperty of the RegExpobject.

扩展@KooiInc 答案,您可以避免使用对象的source属性手动转义每个特殊字符RegExp

Example:

例子:

var urlRegex= new RegExp(''
  + /(?:(?:(https?|ftp):)?\/\/)/.source     // protocol
  + /(?:([^:\n\r]+):([^@\n\r]+)@)?/.source  // user:pass
  + /(?:(?:www\.)?([^\/\n\r]+))/.source     // domain
  + /(\/[^?\n\r]+)?/.source                 // request
  + /(\?[^#\n\r]*)?/.source                 // query
  + /(#?[^\n\r]*)?/.source                  // anchor
);

or if you want to avoid repeating the .sourceproperty you can do it using the Array.map()function:

或者,如果您想避免重复该.source属性,您可以使用以下Array.map()函数:

var urlRegex= new RegExp([
  /(?:(?:(https?|ftp):)?\/\/)/      // protocol
  ,/(?:([^:\n\r]+):([^@\n\r]+)@)?/  // user:pass
  ,/(?:(?:www\.)?([^\/\n\r]+))/     // domain
  ,/(\/[^?\n\r]+)?/                 // request
  ,/(\?[^#\n\r]*)?/                 // query
  ,/(#?[^\n\r]*)?/                  // anchor
].map(function(r) {return r.source}).join(''));

In ES6 the map function can be reduced to: .map(r => r.source)

在 ES6 中,map 函数可以简化为: .map(r => r.source)

回答by Riccardo Galli

Using strings in new RegExpis awkward because you must escape all the backslashes. You may write smaller regexes and concatenate them.

使用字符串new RegExp很尴尬,因为你必须转义所有的反斜杠。您可以编写较小的正则表达式并将它们连接起来。

Let's split this regex

让我们拆分这个正则表达式

/^foo(.*)\bar$/

We will use a function to make things more beautiful later

稍后我们将使用一个函数让事物变得更漂亮

function multilineRegExp(regs, options) {
    return new RegExp(regs.map(
        function(reg){ return reg.source; }
    ).join(''), options);
}

And now let's rock

现在让我们摇滚

var r = multilineRegExp([
     /^foo/,  // we can add comments too
     /(.*)/,
     /\bar$/
]);

Since it has a cost, try to build the real regex just once and then use that.

由于它有成本,尝试只构建一次真正的正则表达式,然后使用它。

回答by James Donohue

There are good answers here, but for completeness someone should mention Javascript's core feature of inheritance with the prototype chain. Something like this illustrates the idea:

这里有很好的答案,但为了完整起见,有人应该提到 Javascript 的核心特性,即使用原型链继承。像这样的事情说明了这个想法:

RegExp.prototype.append = function(re) {
  return new RegExp(this.source + re.source, this.flags);
};

let regex = /[a-z]/g
.append(/[A-Z]/)
.append(/[0-9]/);

console.log(regex); //=> /[a-z][A-Z][0-9]/g

回答by Hashbrown

Thanks to the wonderous world of template literalsyou can now write big, multi-line, well-commented, and even semantically nestedregexes in ES6.

感谢模板文字的奇妙世界,您现在可以在 ES6 中编写大的、多行的、注释良好的甚至语义嵌套的正则表达式。

//build regexes without worrying about
// - double-backslashing
// - adding whitespace for readability
// - adding in comments
let clean = (piece) => (piece
    .replace(/((^|\n)(?:[^\/\]|\/[^*\/]|\.)*?)\s*\/\*(?:[^*]|\*[^\/])*(\*\/|)/g, '')
    .replace(/((^|\n)(?:[^\/\]|\/[^\/]|\.)*?)\s*\/\/[^\n]*/g, '')
    .replace(/\n\s*/g, '')
);
window.regex = ({raw}, ...interpolations) => (
    new RegExp(interpolations.reduce(
        (regex, insert, index) => (regex + insert + clean(raw[index + 1])),
        clean(raw[0])
    ))
);

Using this you can now write regexes like this:

使用它,您现在可以编写这样的正则表达式:

let re = regex`I'm a special regex{3} //with a comment!`;

Outputs

输出

/I'm a special regex{3}/

Or what about multiline?

或者多线呢?

'123hello'
    .match(regex`
        //so this is a regex

        //here I am matching some numbers
        (\d+)

        //Oh! See how I didn't need to double backslash that \d?
        ([a-z]{1,3}) /*note to self, this is group #2*/
    `)
    [2]

Outputs hel, neat!
"What if I need to actually search a newline?", well then use \nsilly!
Working on my Firefox and Chrome.

输出hel,整洁!
“如果我需要实际搜索换行符怎么办?”,然后使用\n愚蠢的!
在我的 Firefox 和 Chrome 上工作。



Okay, "how about something a little more complex?"
Sure, here's a piece of an object destructuring JS parser I was working on:

好的,“来点更复杂的东西怎么样?”
当然,这是我正在研究的对象解构 JS 解析器的一部分

regex`^\s*
    (
        //closing the object
        (\})|

        //starting from open or comma you can...
        (?:[,{]\s*)(?:
            //have a rest operator
            (\.\.\.)
            |
            //have a property key
            (
                //a non-negative integer
                \b\d+\b
                |
                //any unencapsulated string of the following
                \b[A-Za-z$_][\w$]*\b
                |
                //a quoted string
                //this is #5!
                ("|')(?:
                    //that contains any non-escape, non-quote character
                    (?!|\).
                    |
                    //or any escape sequence
                    (?:\.)
                //finished by the quote
                )*
            )
            //after a property key, we can go inside
            \s*(:|)
      |
      \s*(?={)
        )
    )
    ((?:
        //after closing we expect either
        // - the parent's comma/close,
        // - or the end of the string
        \s*(?:[,}\]=]|$)
        |
        //after the rest operator we expect the close
        \s*\}
        |
        //after diving into a key we expect that object to open
        \s*[{[:]
        |
        //otherwise we saw only a key, we now expect a comma or close
        \s*[,}{]
    ).*)
$`

It outputs /^\s*((\})|(?:[,{]\s*)(?:(\.\.\.)|(\b\d+\b|\b[A-Za-z$_][\w$]*\b|("|')(?:(?!\5|\\).|(?:\\.))*\5)\s*(:|)|\s*(?={)))((?:\s*(?:[,}\]=]|$)|\s*\}|\s*[{[:]|\s*[,}{]).*)$/

它输出 /^\s*((\})|(?:[,{]\s*)(?:(\.\.\.)|(\b\d+\b|\b[A-Za-z$_][\w$]*\b|("|')(?:(?!\5|\\).|(?:\\.))*\5)\s*(:|)|\s*(?={)))((?:\s*(?:[,}\]=]|$)|\s*\}|\s*[{[:]|\s*[,}{]).*)$/

And running it with a little demo?

并通过一个小演示运行它?

let input = '{why, hello, there, "you   huge \"", 17, {big,smelly}}';
for (
    let parsed;
    parsed = input.match(r);
    input = parsed[parsed.length - 1]
) console.log(parsed[1]);

Successfully outputs

成功输出

{why
, hello
, there
, "you   huge \""
, 17
,
{big
,smelly
}
}

Note the successful capturing of the quoted string.
I tested it on Chrome and Firefox, works a treat!

请注意成功捕获带引号的字符串。
我在 Chrome 和 Firefox 上测试过,效果很好!

If curious you can checkout what I was doing, and its demonstration.
Though it only works on Chrome, because Firefox doesn't support backreferences or named groups. So note the example given in this answer is actually a neutered version and might get easily tricked into accepting invalid strings.

如果好奇,你可以检出我在做什么,以及它的示范
虽然它只适用于 Chrome,因为 Firefox 不支持反向引用或命名组。所以请注意这个答案中给出的例子实际上是一个绝育版本,可能很容易被欺骗接受无效的字符串。

回答by Anvesh Reddy

The regex above is missing some black slashes which isn't working properly. So, I edited the regex. Please consider this regex which works 99.99% for email validation.

上面的正则表达式缺少一些无法正常工作的黑色斜线。所以,我编辑了正则表达式。请考虑这个正则表达式,它可以 99.99% 用于电子邮件验证。

let EMAIL_REGEXP = 
new RegExp (['^(([^<>()[\]\\.,;:\s@\"]+(\.[^<>()\[\]\\.,;:\s@\"]+)*)',
                    '|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.',
                    '[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+',
                    '[a-zA-Z]{2,}))$'].join(''));

回答by andreasonny83

To avoid the Array join, you can also use the following syntax:

为了避免 Array join,您还可以使用以下语法:

var pattern = new RegExp('^(([^<>()[\]\.,;:\s@\"]+' +
  '(\.[^<>()[\]\.,;:\s@\"]+)*)|(\".+\"))@' +
  '((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|' +
  '(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$');

回答by Mubeena

You can simply use string operation.

您可以简单地使用字符串操作。

var pattenString = "^(([^<>()[\]\.,;:\s@\"]+(\.[^<>()[\]\.,;:\s@\"]+)*)|"+
"(\".+\"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|"+
"(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$";
var patten = new RegExp(pattenString);

回答by Scindix

I tried improving korun's answer by encapsulating everything and implementing support for splitting capturing groups and character sets - making this method much more versatile.

我尝试通过封装所有内容并实现对拆分捕获组和字符集的支持来改进 korun 的答案 - 使这种方法更加通用。

To use this snippet you need to call the variadic function combineRegexwhose arguments are the regular expression objects you need to combine. Its implementation can be found at the bottom.

要使用此代码段,您需要调用可变参数函数,combineRegex其参数是您需要组合的正则表达式对象。它的实现可以在底部找到。

Capturing groups can't be split directly that way though as it would leave some parts with just one parenthesis. Your browser would fail with an exception.

不能以这种方式直接拆分捕获组,因为这样会使某些部分只留下一个括号。您的浏览器会因异常而失败。

Instead I'm simply passing the contents of the capture group inside an array. The parentheses are automatically added when combineRegexencounters an array.

相反,我只是在数组中传递捕获组的内容。combineRegex遇到数组时会自动添加括号。

Furthermore quantifiers need to follow something. If for some reason the regular expression needs to be split in front of a quantifier you need to add a pair of parentheses. These will be removed automatically. The point is that an empty capture group is pretty useless and this way quantifiers have something to refer to. The same method can be used for things like non-capturing groups (/(?:abc)/becomes [/()?:abc/]).

此外,量词需要遵循一些东西。如果由于某种原因需要在量词前拆分正则表达式,则需要添加一对括号。这些将被自动删除。关键是一个空的捕获组是非常无用的,这样量词就有了一些参考。相同的方法可用于诸如非捕获组(/(?:abc)/变成[/()?:abc/])之类的事情。

This is best explained using a simple example:

最好用一个简单的例子来解释:

var regex = /abcd(efghi)+jkl/;

would become:

会成为:

var regex = combineRegex(
    /ab/,
    /cd/,
    [
        /ef/,
        /ghi/
    ],
    /()+jkl/    // Note the added '()' in front of '+'
);

If you must split character sets you can use objects ({"":[regex1, regex2, ...]}) instead of arrays ([regex1, regex2, ...]). The key's content can be anything as long as the object only contains one key. Note that instead of ()you have to use ]as dummy beginning if the first character could be interpreted as quantifier. I.e. /[+?]/becomes {"":[/]+?/]}

如果必须拆分字符集,则可以使用对象 ( {"":[regex1, regex2, ...]}) 而不是数组 ( [regex1, regex2, ...])。只要对象只包含一个键,键的内容就可以是任何内容。请注意,如果第一个字符可以解释为量词,则不必()使用]作为虚拟开头。即/[+?]/成为{"":[/]+?/]}

Here is the snippet and a more complete example:

这是片段和更完整的示例:

function combineRegexStr(dummy, ...regex)
{
    return regex.map(r => {
        if(Array.isArray(r))
            return "("+combineRegexStr(dummy, ...r).replace(dummy, "")+")";
        else if(Object.getPrototypeOf(r) === Object.getPrototypeOf({}))
            return "["+combineRegexStr(/^\]/, ...(Object.entries(r)[0][1]))+"]";
        else 
            return r.source.replace(dummy, "");
    }).join("");
}
function combineRegex(...regex)
{
    return new RegExp(combineRegexStr(/^\(\)/, ...regex));
}

//Usage:
//Original:
console.log(/abcd(?:ef[+A-Z0-9]gh)+$/.source);
//Same as:
console.log(
  combineRegex(
    /ab/,
    /cd/,
    [
      /()?:ef/,
      {"": [/]+A-Z/, /0-9/]},
      /gh/
    ],
    /()+$/
  ).source
);

回答by Bart Kiers

Personally, I'd go for a less complicated regex:

就个人而言,我会选择一个不太复杂的正则表达式:

/\S+@\S+\.\S+/

Sure, it is less accuratethan your current pattern, but what are you trying to accomplish? Are you trying to catch accidental errors your users might enter, or are you worried that your users might try to enter invalid addresses? If it's the first, I'd go for an easier pattern. If it's the latter, some verification by responding to an e-mail sent to that address might be a better option.

当然,它不如您当前的模式准确,但是您想要完成什么?您是想捕捉用户可能输入的意外错误,还是担心您的用户可能会尝试输入无效地址?如果是第一个,我会选择更简单的模式。如果是后者,通过回复发送到该地址的电子邮件进行一些验证可能是更好的选择。

However, if you want to use your current pattern, it would be (IMO) easier to read (and maintain!) by building it from smaller sub-patterns, like this:

但是,如果您想使用当前的模式,通过从较小的子模式构建它会(IMO)更容易阅读(和维护!),如下所示:

var box1 = "([^<>()[\]\\.,;:\s@\"]+(\.[^<>()[\]\\.,;:\s@\"]+)*)";
var box2 = "(\".+\")";

var host1 = "(\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])";
var host2 = "(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,})";

var regex = new RegExp("^(" + box1 + "|" + box2 + ")@(" + host1 + "|" + host2 + ")$");