Linux 你如何在 sed 中指定非捕获组?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4823864/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
how do you specify non-capturing groups in sed?
提问by barlop
is it possible to specify non-capturing groups in sed?
是否可以在 sed 中指定非捕获组?
if so, how?
如果是这样,如何?
采纳答案by barlop
The answer, is that as of writing, you can't - sed does not support it. Sed supports BRE, and ERE, but not PCRE.
答案是,在写作时,你不能 - sed 不支持它。Sed 支持 BRE 和 ERE,但不支持 PCRE。
(Note- One answer points out that BRE is also known as POSIX sed, and ERE is is a GNU extension via sed -r. Point remains that PCRE is not supported by sed. )
(注 - 一个答案指出 BRE 也称为 POSIX sed,而 ERE 是通过 sed -r 的 GNU 扩展。点仍然是 sed 不支持 PCRE。)
Perl will work, for windows or linux
Perl 可以工作,适用于 windows 或 linux
examples here
这里的例子
https://superuser.com/questions/416419/perl-for-matching-with-regular-expressions-in-terminal
https://superuser.com/questions/416419/perl-for-matching-with-regular-expressions-in-terminal
e.g. this from cygwin in windows
例如,这来自 windows 中的 cygwin
$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)//s'
a
$ echo -e 'abcd' | perl -0777 -pe 's/(a)(?:b)(c)(d)//s'
c
There is a program albeit for Windows, which can do search and replace on the command line, and does support PCRE. It's called rxrepl. It's not sed of course, but it does search and replace with PCRE support.
有一个程序虽然是Windows的,可以在命令行上进行搜索和替换,并且确实支持PCRE。它被称为 rxrepl。它当然不是 sed,但它确实使用 PCRE 支持进行搜索和替换。
C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r ""
a
C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(c)" -r ""
c
C:\blah\rxrepl>echo abc | rxrepl -s "(a)(b)(?:c)" -r ""
Invalid match group requested.
C:\blah\rxrepl>echo abc | rxrepl -s "(a)(?:b)(c)" -r ""
c
C:\blah\rxrepl>
The author(not me), mentioned his program in an answer over here https://superuser.com/questions/339118/regex-replace-from-command-line
作者(不是我)在这里的回答中提到了他的程序https://superuser.com/questions/339118/regex-replace-from-command-line
It has a really good syntax.
它有一个非常好的语法。
The standard thing to use would be perl, or almost any other programming language that people use.
要使用的标准语言是 perl,或者人们使用的几乎任何其他编程语言。
回答by SiegeX
I'll assume you are speaking of the backrefence syntax, which are parentheses ( )
not brackets [ ]
我假设你说的是 backrefence 语法,它是括号( )
而不是方括号[ ]
By default, sed
will interpret ( )
literally and not attempt to make a backrefence from them. You will need to escape them to make them special as in \( \)
It is only when you use the GNU sed -r
option will the escaping be reversed. With sed -r
, non escaped ( )
will produce backrefences and escaped \( \)
will be treated as literal. Examples to follow:
默认情况下,sed
将按( )
字面意思进行解释,而不是尝试从它们中进行反驳。您需要对它们进行转义以使它们变得特别,因为\( \)
只有当您使用 GNUsed -r
选项时,转义才会被逆转。使用sed -r
,非转义( )
将产生反向引用,转义\( \)
将被视为文字。要遵循的示例:
POSIX sed
POSIX sed
$ echo "foo(###)bar" | sed 's/foo(.*)bar/@@@@/'
@@@@
$ echo "foo(###)bar" | sed 's/foo(.*)bar//'
sed: -e expression #1, char 16: invalid reference on `s' command's RHS
-bash: echo: write error: Broken pipe
$ echo "foo(###)bar" | sed 's/foo\(.*\)bar//'
(###)
GNU sed -r
GNU sed -r
$ echo "foo(###)bar" | sed -r 's/foo(.*)bar/@@@@/'
@@@@
$ echo "foo(###)bar" | sed -r 's/foo(.*)bar//'
(###)
$ echo "foo(###)bar" | sed -r 's/foo\(.*\)bar//'
sed: -e expression #1, char 18: invalid reference on `s' command's RHS
-bash: echo: write error: Broken pipe
Update
更新
From the comments:
来自评论:
Group-only, non-capturing parentheses ( )
so you can use something like intervals {n,m}
without creating a backreference \1
don't exist. First, intervals are not apart of POSIX sed, you must use the GNU -r
extension to enable them. As soon as you enable -r
any grouping parentheses will also be capturing for backreference use. Examples:
仅限组的非捕获括号,( )
因此您可以使用诸如间隔之类的东西{n,m}
而不会创建反向引用\1
不存在。首先,间隔不是 POSIX sed 的一部分,您必须使用 GNU-r
扩展来启用它们。一旦您启用-r
任何分组括号,也将捕获以供反向引用使用。例子:
$ echo "123.456.789" | sed -r 's/([0-9]{3}\.){2}/###/'
###789
$ echo "123.456.789" | sed -r 's/([0-9]{3}\.){2}/###/'
###456.789
回答by Paused until further notice.
Parentheses can be used for grouping alternatives. For example:
括号可用于对备选方案进行分组。例如:
sed 's/a\(bc\|de\)f/X/'
says to replace "abcf" or"adef" with "X", but the parentheses alsocapture. There is not a facility in sed
to do such grouping without also capturing. If you have a complex regex that does both alternative grouping andcapturing, you will simply have to be careful in selecting the correct capture group in your replacement.
说用“X”替换“abcf”或“adef”,但括号也捕获。如果不进行sed
捕获,就无法进行这种分组。如果您有一个复杂的正则表达式,它既可以进行替代分组又可以进行捕获,则您只需小心选择替换中的正确捕获组即可。
Perhaps you could say more about what it is you're trying to accomplish (what your need for non-capturing groups is) and why you want to avoid capture groups.
也许您可以多说一下您要完成什么(您对非捕获组的需求是什么)以及为什么要避免捕获组。
Edit:
编辑:
There is a type of non-capturing brackets ((?:pattern)
) that are part of Perl-Compatible Regular Expressions(PCRE). They are not supported in sed
(but are when using grep -P
).
有一种非捕获括号 ( (?:pattern)
) 是Perl 兼容正则表达式(PCRE) 的一部分。它们在sed
(但在使用时grep -P
)不受支持。
回答by Hector Llorens
As said, it is not possible to have non-capturing groups in sed. It could be obvious but non-capturing groups are not a necessity. One can just use the desired capturing ones and ignore the non-desired ones as if they were non-capturing. For reference, nested capturing groups are numbered by the position-order of "(".
如前所述,在 sed 中不可能有非捕获组。这可能很明显,但非捕获组不是必需的。人们可以只使用所需的捕获,而忽略不想要的,就好像它们没有捕获一样。作为参考,嵌套捕获组按“(”的位置顺序编号。
E.g.,
例如,
echo "apple and bananas and monkeys" | sed -r "s/((apple|banana)s?)/x/g"
applex and bananasx and monkeys (note: "s" in bananas, first bigger group)
applex 和香蕉sx 和猴子(注意:香蕉中的“s”,第一个更大的组)
vs
对比
echo "apple and bananas and monkeys" | sed -r "s/((apple|banana)s?)/x/g"
applex and bananax and monkeys (note: no "s" in bananas, second smaller group)
applex和bananax和猴子(注意:香蕉中没有“s”,第二个较小的组)