javascript 为什么正则表达式构造函数需要双重转义?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17863066/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-27 09:53:43  来源:igfitidea点击:

Why do regex constructors need to be double escaped?

javascriptregex

提问by Smurfette

In the regex below, \sdenotes a space character. I imagine the regex parser, is going through the string and sees \and knows that the next character is special.

在下面的正则表达式中,\s表示一个空格字符。我想象正则表达式解析器正在遍历字符串并看到\并知道下一个字符是特殊的。

But this is not the case as double escapes are required.

但情况并非如此,因为需要双重转义。

Why is this?

为什么是这样?

var res = new RegExp('(\s|^)' + foo).test(moo);

Is there a concrete example of how a single escape could be mis-interpreted as something else?

是否有一个具体的例子说明一次逃逸可能会被误解为其他东西?

采纳答案by Quentin

You are constructing the regular expression by passing a string to the RegExp constructor.

您正在通过将字符串传递给 RegExp 构造函数来构造正则表达式。

\is an escape character in string literals.

\是字符串文字中的转义字符。

The \is consumed by the string literal parsing…

\由字符串字面解析消耗...

const foo = "foo";
const string = '(\s|^)' + foo;
console.log(string);

… so the data you pass to the RegEx compiler is a plain sand not \s.

... 所以你传递给 RegEx 编译器的数据是一个普通的s而不是\s.

You need to escape the \to express the \as data instead of being an escape character itself.

您需要转义\以表示\数据,而不是转义字符本身。

回答by Joe Enos

Inside the code where you're creating a string, the backslash is a javascript escape character first, which means the escape sequences like \t, \n, \", etc. will be translated into their javascript counterpart (tab, newline, quote, etc.), and that will be made a part of the string. Double-backslash represents a single backslash in the actual string itself, so if you want a backslash in the string, you escape that first.

在那里你要创建一个字符串的代码里面,反斜杠是一个JavaScript转义字符第一,这意味着像转义序列\t\n\",等将被翻译成其JavaScript对口(制表符,换行符,报价等),和这将成为字符串的一部分。双反斜杠表示实际字符串本身中的单个反斜杠,因此如果您想在字符串中使用反斜杠,请先将其转义。

So when you generate a string by saying var someString = '(\\s|^)', what you're really doing is creating an actual string with the value (\s|^).

因此,当您通过 say 生成字符串时var someString = '(\\s|^)',您真正在做的是创建一个具有 value 的实际字符串(\s|^)

回答by Cristian Lupascu

The Regex needs a string representation of \s, which in JavaScript can be produced using the literal "\\s".

Regex 需要 的字符串表示形式\s,在 JavaScript 中可以使用文字"\\s".

Here's a live example to illustrate why "\s"is not enough:

这是一个活生生的例子来说明为什么"\s"还不够:

alert("One backslash:          \s\nDouble backslashes: \s");

Note how an extra \before \schanges the output.

注意一个额外的\before\s改变输出。

回答by schlicht

\ is used in Strings to escape special characters. If you want a backslash in your string (e.g. for the \ in \s) you have to escape it via a backslash. So \ becomes \\ .

\ 在字符串中用于转义特殊字符。如果您想在字符串中使用反斜杠(例如,\s 中的 \),则必须通过反斜杠对其进行转义。所以 \ 变成了 \\ 。

EDIT: Even had to do it here, because \\ in my answer turned to \.

编辑:甚至必须在这里做,因为 \\ 在我的回答中变成了 \。

回答by CertainPerformance

As has been said, inside a string literal, a backslash indicates an escape sequence, rather than a literal backslash character, but the RegExp constructor often needs literal backslash characters in the string passed to it, so the code should have \\s to represent a literal backslash, in most cases.

如前所述,在字符串文字中,反斜杠表示转义序列,而不是文字反斜杠字符,但 RegExp 构造函数通常需要传递给它的字符串中的文字反斜杠字符,因此代码应该有\\s 来表示文字反斜杠,在大多数情况下

A problem is that double-escaping metacharacters is tedious. There is one way to pass a string to new RegExpwithout having to double escape them: use the String.rawtemplate tag, an ES6 feature, which allows you to write a string that will be parsed by the interpreter verbatim, without any parsing of escape sequences. For example:

一个问题是双重转义元字符很乏味。有一种方法可以将字符串传递给new RegExp而不必对它们进行双重转义:使用String.raw模板标记,这是 ES6 的一项功能,它允许您编写一个将由解释器逐字解析的字符串,而无需对转义序列进行任何解析。例如:

console.log('\'.length);           // length 1: an escaped backslash
console.log(`\`.length);           // length 1: an escaped backslash
console.log(String.raw`\`.length); // length 2: no escaping in String.raw!

So, if you wish to keep your code readable, and you have many backslashes, you may use String.rawto type only onebackslash, when the pattern requires a backslash:

所以,如果你想保持你的代码的可读性,而且你有很多反斜杠,你可以使用String.raw输入只有一个当模式需要一个反斜杠反斜杠:

const sentence = 'foo bar baz';
const regex = new RegExp(String.raw`\bfoo\sbar\sbaz\b`);
console.log(regex.test(sentence));

But there's a better option.Generally, there's not much good reason to use new RegExpunless you need to dynamically create a regular expression from existing variables. Otherwise, you should use regex literals instead, which do not require double-escaping of metacharacters, and do not require writing out String.rawto keep the pattern readable:

但还有更好的选择。通常,new RegExp除非您需要从现有变量动态创建正则表达式,否则没有太多好的使用理由。否则,您应该改用正则表达式文字,它不需要对元字符进行双重转义,也不需要写出String.raw以保持模式可读:

const sentence = 'foo bar baz';
const regex = /\bfoo\sbar\sbaz\b/;
console.log(regex.test(sentence));

Best to only use new RegExpwhen the pattern must be created on-the-fly, like in the following snippet:

最好仅new RegExp在必须动态创建模式时使用,如以下代码段所示:

const sentence = 'foo bar baz';
const wordToFind = 'foo'; // from user input

const regex = new RegExp(String.raw`\b${wordToFind}\b`);
console.log(regex.test(sentence));