bash 非贪婪匹配使用 ? 用 grep
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19125173/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Non greedy matching using ? with grep
提问by Sven Richter
I'm writing a bash script which analyses a html file and
I want to get the content of each single <tr>...</tr>
. So my command looks like:
我正在编写一个 bash 脚本来分析一个 html 文件,我想获取每个<tr>...</tr>
. 所以我的命令看起来像:
$ tr -d \012 < price.html | grep -oE '<tr>.*?</tr>'
But it seems that grep
gives me the result of:
但似乎grep
给了我以下结果:
$ tr -d \012 < price.html | grep -oE '<tr>.*</tr>'
How can I make .*
non-greedy?
我怎样才能.*
不贪婪?
回答by Chris Seymour
If you have GNU Grep
you can use -P
to make the match non-greedy:
如果你有GNU Grep
你可以-P
用来使匹配非贪婪:
$ tr -d \012 < price.html | grep -Po '<tr>.*?</tr>'
The -P
option enables Perl Compliant Regular Expression (PCRE)which is needed for non-greedy matching with ?
as Basic Regular Expression (BRE)and Extended Regular Expression (ERE)do not support it.
该-P
选项启用非贪婪匹配所需的Perl 兼容正则表达式(PCRE),?
因为基本正则表达式(BRE)和扩展正则表达式(ERE)不支持它。
If you are using -P
you could also use look aroundsto avoid printing the tags in the match like so:
如果您正在使用,-P
您还可以使用环视来避免在匹配中打印标签,如下所示:
$ tr -d \012 < price.html | grep -Po '(?<=<tr>).*?(?=</tr>)'
If you don't have GNU grep
and the HTML is well formed you could just do:
如果您没有GNU grep
并且 HTML 格式良好,您可以这样做:
$ tr -d \012 < price.html | grep -o '<tr>[^<]*</tr>'
Note: The above example won't work with nested tags within <tr>
.
注意:上面的示例不适用于<tr>
.
回答by tripleee
Non-greedy matching is not part of the Extended Regular Expression syntax supported by grep -E
. Use grep -P
instead if you have that, or switch to Perl / Python / Ruby / what have you. (Oh, and pcregrep
.)
非贪婪匹配不是grep -E
. grep -P
如果你有,请改用,或者切换到 Perl/Python/Ruby/你有什么。(哦,还有pcregrep
。)
Of course, if you really mean
当然,如果你真的想说
<tr>[^<>]*</tr>
you should say that instead; then plain old grep
will work fine.
你应该这样说;那么普通的旧的grep
就可以正常工作。
You could (tediously) extend the regex to accept nested tags which are not <tr>
but of course, it's better to use a proper HTML parser than spend a lot of time rediscovering why regular expressions are not the right tool for this.
您可以(乏味地)扩展正则表达式以接受嵌套标签,<tr>
但当然,最好使用适当的 HTML 解析器,而不是花费大量时间重新发现为什么正则表达式不是正确的工具。
回答by ThisSuitIsBlackNot
.*?
is a Perl regular expression. Change your grep
to
.*?
是一个 Perl 正则表达式。改变你grep
的
grep -oP '<tr>.*?</tr>'
回答by Fredrik Pihl
Try perl-style-regexp
试试 perl-style-regexp
$ grep -Po '<tr>.*?</tr>' input
<tr>stuff</tr>
<tr>more stuff</tr>