bash Grep 和正则表达式 - 为什么我要转义大括号?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26783219/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Grep and regex - why am I escaping curly braces?
提问by Justin St. Giles Payne
I'm deeply puzzled by the way grep seems to parse a regex:
我对 grep 似乎解析正则表达式的方式深感困惑:
$ echo "@NS500287" | grep '^@NS500[0-9]{3}'
#nothing
$ echo "@NS500287" | grep '^@NS500[0-9]\{3\}'
@NS500287
That can't be right. Why am I escaping curly brackets that are part of a "match the previous, N times" component (and not, say, the square brackets as well)?
那不可能是对的。为什么我要转义作为“匹配前 N 次”组件一部分的大括号(而不是方括号)?
Shouldn't escaping be necessary only when I'm writing a regex that actually matches {
and }
as literal characters in the query string?
只有当我编写一个实际匹配的正则表达式{
并}
作为查询字符串中的文字字符时才需要转义吗?
More of a cri de coeurthan anything else, but I'm curious about the answer.
比其他任何事情都更像是一个cri de coeur,但我对答案很好奇。
回答by fedorqui 'SO stop harming'
This is because {}
are special characters and they need to handled differently to have this special behaviour. Otherwise, they will be treated as literal {
and }
.
这是因为{}
是特殊字符,它们需要以不同方式处理才能具有这种特殊行为。否则,它们将被视为文字{
和}
。
You can either escape like you did:
你可以像你一样逃脱:
$ echo "@NS500287" | grep '^@NS500[0-9]\{3\}'
@NS500287
or use grep -E
:
或使用grep -E
:
$ echo "@NS500287" | grep -E '^@NS500[0-9]{3}'
@NS500287
Without any processing:
无需任何处理:
$ echo "he{llo" | grep "{"
he{llo
From man grep
:
来自man grep
:
-E, --extended-regexp
Interpret PATTERN as an extended regular expression (ERE, see below). (-E is specified by POSIX.)
...
REGULAR EXPRESSIONS
A regular expression is a pattern that describes a set of strings. Regular expressions are constructed analogously to arithmetic expressions, by using various operators to combine smaller expressions.
grep understands three different versions of regular expression syntax: “basic,” “extended” and “perl.” In GNU grep, there is no difference in available functionality between basic and extended syntaxes. In other implementations, basic regular expressions are less powerful. The following description applies to extended regular expressions; differences for basic regular expressions are summarized afterwards. Perl regular expressions give additional functionality, and are documented in pcresyntax(3) and pcrepattern(3), but may not be available on every system.
...
Basic vs Extended Regular Expressions
In basic regular expressions the meta-characters ?, +, {, |, (, and ) lose their special meaning; instead use the backslashed versions
\?
,\+
,\{
,\|
,\(
, and\)
.
-E, --extended-regexp
将 PATTERN 解释为扩展的正则表达式(ERE,见下文)。(-E 由 POSIX 指定。)
...
常用表达
正则表达式是描述一组字符串的模式。正则表达式的构造类似于算术表达式,通过使用各种运算符来组合较小的表达式。
grep 理解三种不同版本的正则表达式语法:“basic”、“extended”和“perl”。在 GNU grep 中,基本语法和扩展语法之间的可用功能没有区别。在其他实现中,基本的正则表达式功能较弱。以下说明适用于扩展正则表达式;后面总结了基本正则表达式的差异。Perl 正则表达式提供了额外的功能,并在 pcresyntax(3) 和 pcrepattern(3) 中进行了说明,但可能并非在每个系统上都可用。
...
基本与扩展正则表达式
在基本的正则表达式中,元字符 ?、+、{、|、( 和 ) 失去了它们的特殊意义;改用反斜杠的版本
\?
,\+
,\{
,\|
,\(
,和\)
。
回答by Tom Fenech
The answer relates to the difference between Basic Regular Expressions (BREs) and Extended ones (EREs).
答案与基本正则表达式 (BRE) 和扩展正则表达式 (ERE) 之间的区别有关。
In BRE mode (i.e. when you call grep with no argument to specify otherwise), the
{
and}
are interpreted as literal characters. Escaping them with\
means that they are to be interpreted as a number of instances of the previous pattern.If you were to use
grep -E
instead (ERE mode), you would be able to use{
and}
without escaping to refer to the count. In ERE mode, escaping the braces causes them to be interpreted literally instead.
在 BRE 模式下(即当您调用 grep 时不带参数指定其他方式),
{
和}
被解释为文字字符。转义它们\
意味着它们将被解释为先前模式的多个实例。如果您改为使用
grep -E
(ERE 模式),您将能够使用{
并且}
无需转义来引用计数。在 ERE 模式下,转义大括号会导致它们按字面解释。
回答by Steven Penny
Instead do
而是做
echo '@NS500287' | egrep '^@NS500[0-9]{3}'
# ^
# /
# notice ---