Javascript 在正则表达式中,匹配字符串的结尾或特定字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12083308/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
In regex, match either the end of the string or a specific character
提问by Gary
I have a string. The end is different, such as index.php?test=1&list=UL
or index.php?list=UL&more=1
. The one thing I'm looking for is &list=
.
我有一个字符串。结尾不同,例如index.php?test=1&list=UL
or index.php?list=UL&more=1
。我正在寻找的一件事是&list=
。
How can I match it, whether it's in the middle of the string or it's at the end? So far I've got [&|\?]list=.*?([&|$])
, but the ([&|$])
part doesn't actually work; I'm trying to use that to match either &
or the end of the string, but the end of the string part doesn't work, so this pattern matches the second example but not the first.
我如何匹配它,无论是在字符串的中间还是在末尾?到目前为止,我有[&|\?]list=.*?([&|$])
,但该([&|$])
部分实际上不起作用;我正在尝试使用它来匹配&
字符串的结尾或结尾,但字符串部分的结尾不起作用,因此此模式与第二个示例匹配,但与第一个不匹配。
回答by Jo?o Silva
回答by Wiktor Stribi?ew
In short
简而言之
Any zero-width assertions inside [...]
lose there meaning of a zero-width assertion. [\b]
does not match a word boundary (it matches a backspace, or, in POSIX, \
or b
), [$]
matches a literal $
char, [^]
is either an error or, as in ECMAScript regex flavor, any char. Same with \z
, \Z
, \A
anchors.
内部的任何零宽度断言都[...]
失去了零宽度断言的意义。[\b]
不匹配单词边界(它匹配退格符,或者,在 POSIX 中,\
或b
),[$]
匹配文字$
字符,[^]
要么是错误,要么是 ECMAScript regex 风格中的任何字符。与\z
, \Z
,\A
锚点相同。
You may solve the problem using any of the below patterns:
您可以使用以下任何一种模式来解决问题:
[&?]list=([^&]*)
[&?]list=(.*?)(?=&|$)
[&?]list=(.*?)(?![^&])
Matching between a char sequence and a single char or end of string (current scenario)
字符序列与单个字符或字符串结尾之间的匹配(当前场景)
The .*?([YOUR_SINGLE_CHAR_DELIMITER(S)]|$)
pattern (suggested by Jo?o Silva) is rather inefficient since the regex engine checks for the patterns that appear to the right of the lazy dot pattern first, and only if they do not match does it "expand" the lazy dot pattern.
该.*?([YOUR_SINGLE_CHAR_DELIMITER(S)]|$)
模式(由 Jo?o Silva 建议)效率很低,因为正则表达式引擎首先检查出现在惰性点模式右侧的模式,只有当它们不匹配时,它才会“扩展”惰性点模式。
In these cases it is recommended to use negated character class(or bracket expressionin the POSIX talk):
在这些情况下,建议使用否定字符类(或POSIX 谈话中的括号表达式):
[&?]list=([^&]*)
See demo. Details
见演示。细节
[&?]
- a positive character class matching either&
or?
(note the relationships between chars/char ranges in a character class are OR relationships)list=
- a substring, char sequence([^&]*)
- Capturing group #1: zero or more (*
) chars other than&
([^&]
), as many as possible
[&?]
- 匹配&
或的正字符类?
(注意字符类中字符/字符范围之间的关系是 OR 关系)list=
- 一个子字符串,字符序列([^&]*)
-捕获组#1:零个或多个(*
)字符以外&
([^&]
),尽可能多的
Checking for the trailing single char delimiter presence without returning it or end of string
检查尾随的单个字符分隔符的存在而不返回它或字符串的结尾
Most regex flavors (including JavaScript beginning with ECMAScript 2018) support lookarounds, constructs that only return true or false if there patterns match or not. They are crucial in case consecutive matches that may start and end with the same char are expected (see the original pattern, it may match a string starting and ending with &
). Although it is not expected in a query string, it is a common scenario.
大多数正则表达式(包括从 ECMAScript 2018 开始的 JavaScript)都支持环视,仅在模式匹配与否时才返回 true 或 false 的构造。如果预期可能以相同字符开头和结尾的连续匹配项(请参阅原始模式,它可能匹配以 开头和结尾的字符串&
),它们是至关重要的。虽然它不是查询字符串中所期望的,但它是一个常见的场景。
In that case, you can use two approaches:
在这种情况下,您可以使用两种方法:
- A positive lookahead with an alternation containing positive character class:
(?=[SINGLE_CHAR_DELIMITER(S)]|$)
- A negative lookahead with just a negative character class:
(?![^SINGLE_CHAR_DELIMITER(S)])
- 具有包含正字符类的交替的正前瞻:
(?=[SINGLE_CHAR_DELIMITER(S)]|$)
- 仅负字符类的负前瞻:
(?![^SINGLE_CHAR_DELIMITER(S)])
The negative lookahead solution is a bit more efficient because it does not contain an alternation group that adds complexity to matching procedure. The OP solution would look like
负前瞻解决方案的效率更高一些,因为它不包含增加匹配过程复杂性的交替组。OP 解决方案看起来像
[&?]list=(.*?)(?=&|$)
or
或者
[&?]list=(.*?)(?![^&])
See this regex demoand another one here.
Certainly, in case the trailing delimiters are multichar sequences, only a positive lookahead solution will work since [^yes]
does not negate a sequence of chars, but the chars inside the class (i.e. [^yes]
matches any char but y
, e
and s
).
当然,如果尾随定界符是多字符序列,则只有正向前瞻解决方案才能工作,因为[^yes]
它不会否定字符序列,而是否定类内的字符(即[^yes]
匹配任何字符,但y
,e
和s
)。