bash 非贪婪匹配使用 ? 用 grep

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19125173/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 08:07:35  来源:igfitidea点击:

Non greedy matching using ? with grep

regexbashgrep

提问by Sven Richter

I'm writing a bash script which analyses a html file and I want to get the content of each single <tr>...</tr>. So my command looks like:

我正在编写一个 bash 脚本来分析一个 html 文件,我想获取每个<tr>...</tr>. 所以我的命令看起来像:

$ tr -d \012 < price.html | grep -oE '<tr>.*?</tr>'

But it seems that grepgives me the result of:

但似乎grep给了我以下结果:

$ tr -d \012 < price.html | grep -oE '<tr>.*</tr>'

How can I make .*non-greedy?

我怎样才能.*不贪婪?

回答by Chris Seymour

If you have GNU Grepyou can use -Pto make the match non-greedy:

如果你有GNU Grep你可以-P用来使匹配非贪婪:

$ tr -d \012 < price.html | grep -Po '<tr>.*?</tr>'

The -Poption enables Perl Compliant Regular Expression (PCRE)which is needed for non-greedy matching with ?as Basic Regular Expression (BRE)and Extended Regular Expression (ERE)do not support it.

-P选项启用非贪婪匹配所需的Perl 兼容正则表达式(PCRE)?因为基本正则表达式(BRE)和扩展正则表达式(ERE)不支持它。

If you are using -Pyou could also use look aroundsto avoid printing the tags in the match like so:

如果您正在使用,-P您还可以使用环视来避免在匹配中打印标签,如下所示:

$ tr -d \012 < price.html | grep -Po '(?<=<tr>).*?(?=</tr>)'


If you don't have GNU grepand the HTML is well formed you could just do:

如果您没有GNU grep并且 HTML 格式良好,您可以这样做:

$ tr -d \012 < price.html | grep -o '<tr>[^<]*</tr>'

Note: The above example won't work with nested tags within <tr>.

注意:上面的示例不适用于<tr>.

回答by tripleee

Non-greedy matching is not part of the Extended Regular Expression syntax supported by grep -E. Use grep -Pinstead if you have that, or switch to Perl / Python / Ruby / what have you. (Oh, and pcregrep.)

非贪婪匹配不是grep -E. grep -P如果你有,请改用,或者切换到 Perl/Python/Ruby/你有什么。(哦,还有pcregrep。)

Of course, if you really mean

当然,如果你真的想说

<tr>[^<>]*</tr>

you should say that instead; then plain old grepwill work fine.

你应该这样说;那么普通的旧的grep就可以正常工作。

You could (tediously) extend the regex to accept nested tags which are not <tr>but of course, it's better to use a proper HTML parser than spend a lot of time rediscovering why regular expressions are not the right tool for this.

您可以(乏味地)扩展正则表达式以接受嵌套标签,<tr>但当然,最好使用适当的 HTML 解析器,而不是花费大量时间重新发现为什么正则表达式不是正确的工具。

回答by ThisSuitIsBlackNot

.*?is a Perl regular expression. Change your grepto

.*?是一个 Perl 正则表达式。改变你grep

grep -oP '<tr>.*?</tr>'

回答by Fredrik Pihl

Try perl-style-regexp

试试 perl-style-regexp

$ grep -Po '<tr>.*?</tr>' input
<tr>stuff</tr>
<tr>more stuff</tr>