javascript 为什么正则表达式构造函数需要双重转义?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17863066/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why do regex constructors need to be double escaped?
提问by Smurfette
In the regex below, \s
denotes a space character. I imagine the regex parser, is going through the string and sees \
and knows that the next character is special.
在下面的正则表达式中,\s
表示一个空格字符。我想象正则表达式解析器正在遍历字符串并看到\
并知道下一个字符是特殊的。
But this is not the case as double escapes are required.
但情况并非如此,因为需要双重转义。
Why is this?
为什么是这样?
var res = new RegExp('(\s|^)' + foo).test(moo);
Is there a concrete example of how a single escape could be mis-interpreted as something else?
是否有一个具体的例子说明一次逃逸可能会被误解为其他东西?
采纳答案by Quentin
You are constructing the regular expression by passing a string to the RegExp constructor.
您正在通过将字符串传递给 RegExp 构造函数来构造正则表达式。
\
is an escape character in string literals.
\
是字符串文字中的转义字符。
The \
is consumed by the string literal parsing…
该\
由字符串字面解析消耗...
const foo = "foo";
const string = '(\s|^)' + foo;
console.log(string);
… so the data you pass to the RegEx compiler is a plain s
and not \s
.
... 所以你传递给 RegEx 编译器的数据是一个普通的s
而不是\s
.
You need to escape the \
to express the \
as data instead of being an escape character itself.
您需要转义\
以表示\
数据,而不是转义字符本身。
回答by Joe Enos
Inside the code where you're creating a string, the backslash is a javascript escape character first, which means the escape sequences like \t
, \n
, \"
, etc. will be translated into their javascript counterpart (tab, newline, quote, etc.), and that will be made a part of the string. Double-backslash represents a single backslash in the actual string itself, so if you want a backslash in the string, you escape that first.
在那里你要创建一个字符串的代码里面,反斜杠是一个JavaScript转义字符第一,这意味着像转义序列\t
,\n
,\"
,等将被翻译成其JavaScript对口(制表符,换行符,报价等),和这将成为字符串的一部分。双反斜杠表示实际字符串本身中的单个反斜杠,因此如果您想在字符串中使用反斜杠,请先将其转义。
So when you generate a string by saying var someString = '(\\s|^)'
, what you're really doing is creating an actual string with the value (\s|^)
.
因此,当您通过 say 生成字符串时var someString = '(\\s|^)'
,您真正在做的是创建一个具有 value 的实际字符串(\s|^)
。
回答by Cristian Lupascu
The Regex needs a string representation of \s
, which in JavaScript can be produced using the literal "\\s"
.
Regex 需要 的字符串表示形式\s
,在 JavaScript 中可以使用文字"\\s"
.
Here's a live example to illustrate why "\s"
is not enough:
这是一个活生生的例子来说明为什么"\s"
还不够:
alert("One backslash: \s\nDouble backslashes: \s");
Note how an extra \
before \s
changes the output.
注意一个额外的\
before\s
改变输出。
回答by schlicht
\ is used in Strings to escape special characters. If you want a backslash in your string (e.g. for the \ in \s) you have to escape it via a backslash. So \ becomes \\ .
\ 在字符串中用于转义特殊字符。如果您想在字符串中使用反斜杠(例如,\s 中的 \),则必须通过反斜杠对其进行转义。所以 \ 变成了 \\ 。
EDIT: Even had to do it here, because \\ in my answer turned to \.
编辑:甚至必须在这里做,因为 \\ 在我的回答中变成了 \。
回答by CertainPerformance
As has been said, inside a string literal, a backslash indicates an escape sequence, rather than a literal backslash character, but the RegExp constructor often needs literal backslash characters in the string passed to it, so the code should have \\
s to represent a literal backslash, in most cases.
如前所述,在字符串文字中,反斜杠表示转义序列,而不是文字反斜杠字符,但 RegExp 构造函数通常需要传递给它的字符串中的文字反斜杠字符,因此代码应该有\\
s 来表示文字反斜杠,在大多数情况下。
A problem is that double-escaping metacharacters is tedious. There is one way to pass a string to new RegExp
without having to double escape them: use the String.raw
template tag, an ES6 feature, which allows you to write a string that will be parsed by the interpreter verbatim, without any parsing of escape sequences. For example:
一个问题是双重转义元字符很乏味。有一种方法可以将字符串传递给new RegExp
而不必对它们进行双重转义:使用String.raw
模板标记,这是 ES6 的一项功能,它允许您编写一个将由解释器逐字解析的字符串,而无需对转义序列进行任何解析。例如:
console.log('\'.length); // length 1: an escaped backslash
console.log(`\`.length); // length 1: an escaped backslash
console.log(String.raw`\`.length); // length 2: no escaping in String.raw!
So, if you wish to keep your code readable, and you have many backslashes, you may use String.raw
to type only onebackslash, when the pattern requires a backslash:
所以,如果你想保持你的代码的可读性,而且你有很多反斜杠,你可以使用String.raw
输入只有一个当模式需要一个反斜杠反斜杠:
const sentence = 'foo bar baz';
const regex = new RegExp(String.raw`\bfoo\sbar\sbaz\b`);
console.log(regex.test(sentence));
But there's a better option.Generally, there's not much good reason to use new RegExp
unless you need to dynamically create a regular expression from existing variables. Otherwise, you should use regex literals instead, which do not require double-escaping of metacharacters, and do not require writing out String.raw
to keep the pattern readable:
但还有更好的选择。通常,new RegExp
除非您需要从现有变量动态创建正则表达式,否则没有太多好的使用理由。否则,您应该改用正则表达式文字,它不需要对元字符进行双重转义,也不需要写出String.raw
以保持模式可读:
const sentence = 'foo bar baz';
const regex = /\bfoo\sbar\sbaz\b/;
console.log(regex.test(sentence));
Best to only use new RegExp
when the pattern must be created on-the-fly, like in the following snippet:
最好仅new RegExp
在必须动态创建模式时使用,如以下代码段所示:
const sentence = 'foo bar baz';
const wordToFind = 'foo'; // from user input
const regex = new RegExp(String.raw`\b${wordToFind}\b`);
console.log(regex.test(sentence));