bash 如何将正则表达式中的方括号与 grep 匹配?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30044199/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 12:55:59  来源:igfitidea点击:

How can I match square bracket in regex with grep?

regexbashgrep

提问by Jahid

I am trying to match both [and ]with grep, but only succeeded to match [. No matter how I try, I can't seem to get it right to match ].

我正在尝试将[]与 grep匹配,但只成功匹配[. 无论我如何尝试,我似乎都无法正确匹配]

Here's a code sample:

这是一个代码示例:

echo "fdsl[]" | grep -o "[ a-z]\+" #this prints fdsl
echo "fdsl[]" | grep -o "[ \[a-z]\+" #this prints fdsl[
echo "fdsl[]" | grep -o "[ \]a-z]\+" #this prints nothing
echo "fdsl[]" | grep -o "[ \[\]a-z]\+" #this prints nothing

Edit: My original regex, on which I need to do this, is this one:

编辑:我需要这样做的原始正则表达式是这样的:

echo "fdsl[]" | grep -o "[ \[\]\t\na-zA-Z\/:\.0-9_~\"'+,;*\=()$\!@#&?-]\+" 
#this prints nothing

N.B: I have tried all the answers from thispost but that didn't work on this particular case. And I need to use those brackets inside [].

注意:我已经尝试了这篇文章中的所有答案,但在这个特殊情况下不起作用。我需要在里面使用这些括号[]

采纳答案by nhahtdh

According to BRE/ERE Bracketed Expressionsection of POSIX regex specification:

根据POSIX regex 规范的BRE/ERE Bracketed Expression部分:

  1. [...] The right-bracket ( ']') shall lose its special meaning and represent itself in a bracket expression if it occurs first in the list (after an initial circumflex ( '^'), if any). Otherwise, it shall terminate the bracket expression, unless it appears in a collating symbol (such as "[.].]") or is the ending right-bracket for a collating symbol, equivalence class, or character class. The special characters '.', '*', '[', and '\'(period, asterisk, left-bracket, and backslash, respectively) shall lose their special meaning within a bracket expression.
  1. [...] 右括号 ( ']') 将失去其特殊含义,如果它首先出现在列表中(在初始抑扬符 ( '^') 之后,如果有的话),它将在括号表达式中表示自己。否则,它将终止括号表达式,除非它出现在整理符号(例如"[.].]")中或者是整理符号、等价类或字符类的右括号结束。特殊字符'.''*''[''\'(分别为句号、星号、左括号和反斜杠)在括号表达式中将失去其特殊含义。

and

  1. [...] If a bracket expression specifies both '-'and ']', the ']'shall be placed first (after the '^', if any) and the '-'last within the bracket expression.
  1. [...] 如果括号表达式同时指定'-'']',则']'应放在括号表达式中的第一个(在 之后'^',如果有的话)和'-'最后一个。

Therefore, your regex should be:

因此,您的正则表达式应该是:

echo "fdsl[]" | grep -Eo "[][ a-z]+"

Note the Eflag, which specifies to use ERE, which supports +quantifier. +quantifier is not supported in BRE (the default mode).

注意E标志,它指定使用支持+量词的ERE 。+BRE(默认模式)不支持量词。

The solution in Mike Holt's answer "[][a-z ]\+"with escaped +works because it's run on GNU grep, which extends the grammar to support \+to mean repeat once or more. It's actually undefined behavior according to POSIX standard(which means that the implementation can give meaningful behavior and document it, or throw a syntax error, or whatever).

Mike Holt 对"[][a-z ]\+"转义的回答中的解决方案+有效,因为它在 GNU grep 上运行,它扩展了语法以支持\+表示重复一次或多次根据 POSIX 标准,它实际上是未定义的行为(这意味着实现可以给出有意义的行为并记录它,或者抛出一个语法错误,或者其他什么)。

If you are fine with the assumption that your code can only be run on GNU environment, then it's totally fine to use Mike Holt's answer. Using sedas example, you are stuck with BRE when you use POSIX sed(no flag to switch over to ERE), and it's cumbersome to write even simple regular expression with POSIX BRE, where the only defined quantifier is *.

如果您假设您的代码只能在 GNU 环境中运行,那么使用 Mike Holt 的答案完全没问题。使用sed为例,你被卡住BRE当您使用POSIXsed(无标志切换到ERE),它的繁琐与POSIX BRE,其中唯一的定义量词是写即使是简单的正则表达式*

Original regex

原始正则表达式

Note that grepconsumes the input file line by line, then checks whether the line matches the regex. Therefore, even if you use Pflag with your original regex, \nis always redundant, as the regex can't match across lines.

请注意,它grep会逐行消耗输入文件,然后检查该行是否与正则表达式匹配。因此,即使您P在原始正则表达式中使用标志,\n也总是多余的,因为正则表达式不能跨行匹配。

While it is possible to match horizontal tab withoutPflag, I think it is more natural to use Pflag for this task.

虽然可以在没有flag 的情况下匹配水平制表符,P我认为P在此任务中使用flag更自然。

Given this input:

鉴于此输入:

$ echo -e "fds\tl[]kSAJD<>?,./:\";'{}|[]\!@#$%^&*()_+-=~\`89"
fds     l[]kSAJD<>?,./:";'{}|[]\!@#$%^&*()_+-=~`89

The original regex in the question works with little modification (unescape +at the end):

问题中的原始正则表达式几乎没有修改(最后是 unescape +):

$ echo -e "fds\tl[]kSAJD<>?,./:\";'{}|[]\!@#$%^&*()_+-=~\`89" | grep -Po "[ \[\]\t\na-zA-Z\/:\.0-9_~\"'+,;*\=()$\!@#&?-]+"
fds     l[]kSAJD
?,./:";'
[]
!@#$
&*()_+-=~
89

Though we can remove \n(since it is redundant, as explained above), and a few other unnecessary escapes:

虽然我们可以删除\n(因为它是多余的,如上所述),以及其他一些不必要的转义:

$ echo -e "fds\tl[]kSAJD<>?,./:\";'{}|[]\!@#$%^&*()_+-=~\`89" | grep -Po "[ \[\]\ta-zA-Z/:.0-9_~\"'+,;*=()$\!@#&?-]+"
fds     l[]kSAJD
?,./:";'
[]
!@#$
&*()_+-=~
89

回答by skotka

One issue is that [is a special character in expression and it cannot get escaped with \(at least not in my flavors of grep). Solution is to define it like [[].

一个问题是它[是表达式中的一个特殊字符,它不能被转义\(至少在我的 grep 风格中不是)。解决方案是将其定义为[[].

回答by Mike Holt

According to regular-expressions.info:

根据正则表达式.info

In most regex flavors, the only special characters or metacharacters inside a character class are the closing bracket (]), the backslash (\), the caret (^), and the hyphen (-). The usual metacharacters are normal characters inside a character class, and do not need to be escaped by a backslash.
在大多数正则表达式中,字符类中唯一的特殊字符或元字符是右括号 (])、反斜杠 (\)、插入符号 (^) 和连字符 (-)。通常的元字符是字符类中的普通字符,不需要用反斜杠转义。

... and ...

... 和 ...

The closing bracket (]), the caret (^) and the hyphen (-) can be included by escaping them with a backslash, or by placing them in a position where they do not take on their special meaning.
结束括号 (])、插入符号 (^) 和连字符 (-) 可以通过用反斜杠将它们转义或将它们放在它们不具有特殊含义的位置来包含在内。

So, assuming that the particular flavor of regular expressions syntax supported by grepconforms to this, then I would have expected that "[ a-z[\]]\+"shouldhave worked.

因此,假设所支持的正则表达式语法的特定风格grep符合这一点,那么我会期望它"[ a-z[\]]\+"应该有效。

However, my version of grep (GNU grep 2.14) only matches the "[]"at the end of "fdsl[]"with this regex.

但是,我的 grep 版本(GNU grep 2.14)仅与此正则表达式"[]"末尾的匹配"fdsl[]"

However, I tried using the other technique mentioned in that quote (putting the ]in a position within the character class where it cannot take on its normal meaning, and it seems to have worked:

但是,我尝试使用该引用中提到的其他技术(将 放在]字符类中它不能具有其正常含义的位置,并且它似乎有效:

$ echo "fdsl[]" | grep -o "[][a-z ]\+"
fdsl[]