bash 非贪婪匹配使用 ? 用 grep

Question

提问by Sven Richter

I'm writing a bash script which analyses a html file and I want to get the content of each single <tr>...</tr>. So my command looks like:

我正在编写一个 bash 脚本来分析一个 html 文件，我想获取每个<tr>...</tr>. 所以我的命令看起来像：

$ tr -d \012 < price.html | grep -oE '<tr>.*?</tr>'

But it seems that grepgives me the result of:

但似乎grep给了我以下结果：

$ tr -d \012 < price.html | grep -oE '<tr>.*</tr>'

How can I make .*non-greedy?

我怎样才能.*不贪婪？

Answer 1

回答by Chris Seymour

If you have GNU Grepyou can use -Pto make the match non-greedy:

如果你有GNU Grep你可以-P用来使匹配非贪婪：

$ tr -d \012 < price.html | grep -Po '<tr>.*?</tr>'

The -Poption enables Perl Compliant Regular Expression (PCRE)which is needed for non-greedy matching with ?as Basic Regular Expression (BRE)and Extended Regular Expression (ERE)do not support it.

该-P选项启用非贪婪匹配所需的Perl 兼容正则表达式(PCRE)，?因为基本正则表达式(BRE)和扩展正则表达式(ERE)不支持它。

If you are using -Pyou could also use look aroundsto avoid printing the tags in the match like so:

如果您正在使用，-P您还可以使用环视来避免在匹配中打印标签，如下所示：

$ tr -d \012 < price.html | grep -Po '(?<=<tr>).*?(?=</tr>)'

If you don't have GNU grepand the HTML is well formed you could just do:

如果您没有GNU grep并且 HTML 格式良好，您可以这样做：

$ tr -d \012 < price.html | grep -o '<tr>[^<]*</tr>'

Note: The above example won't work with nested tags within <tr>.

注意：上面的示例不适用于<tr>.

Answer 2

回答by tripleee

Non-greedy matching is not part of the Extended Regular Expression syntax supported by grep -E. Use grep -Pinstead if you have that, or switch to Perl / Python / Ruby / what have you. (Oh, and pcregrep.)

非贪婪匹配不是grep -E. grep -P如果你有，请改用，或者切换到 Perl/Python/Ruby/你有什么。（哦，还有pcregrep。）

Of course, if you really mean

当然，如果你真的想说

<tr>[^<>]*</tr>

you should say that instead; then plain old grepwill work fine.

你应该这样说；那么普通的旧的grep就可以正常工作。

You could (tediously) extend the regex to accept nested tags which are not <tr>but of course, it's better to use a proper HTML parser than spend a lot of time rediscovering why regular expressions are not the right tool for this.

您可以（乏味地）扩展正则表达式以接受嵌套标签，<tr>但当然，最好使用适当的 HTML 解析器，而不是花费大量时间重新发现为什么正则表达式不是正确的工具。

Answer 3

回答by ThisSuitIsBlackNot

.*?is a Perl regular expression. Change your grepto

.*?是一个 Perl 正则表达式。改变你grep的

grep -oP '<tr>.*?</tr>'

Answer 4

回答by Fredrik Pihl

Try perl-style-regexp

试试 perl-style-regexp

$ grep -Po '<tr>.*?</tr>' input
<tr>stuff</tr>
<tr>more stuff</tr>

bash 非贪婪匹配使用 ? 用 grep

提问by Sven Richter

回答by Chris Seymour

回答by tripleee

回答by ThisSuitIsBlackNot

回答by Fredrik Pihl

相关推荐

最近更新

标签

bash 非贪婪匹配使用 ? 用 grep

提问by Sven Richter

回答by Chris Seymour

回答by tripleee

回答by ThisSuitIsBlackNot

回答by Fredrik Pihl

相关推荐

Ping 的 Bash 退出状态代码

bash 打印 'find' linux 命令找到匹配项的目录

解压未知名称文件的 Bash 脚本

bash 在bash中删除给定文件夹中的所有目录

相关推荐

最近更新

标签