bash 使用 grep 从本地文件中的 HTML 标签中获取文本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3593124/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Getting text from inside an HTML tag within a local file with grep
提问by LakeMicrobe
Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
Excerpt From Input File
输入文件摘录
<TD class="clsTDLabelWeb" width="28%">Municipality: </TD>
<TD style="WIDTH: 394px" class="clsTDLabelSm" colSpan="5">
<span id="DInfo1_Municipality">JUPITER</span></TD>
My Regular Expression
我的正则表达式
(?<=<span id="DInfo1_Municipality">)([^</span>]*)
I have an HTML file saved to disk. I would like to use grep to search through the file and output the contents of a specific span, though I don't know if this is a proper use of grep. When I run grep on the file with the expression read from another file (so I dont mess up escaping any special characters), it doesn't output anything. I have tested the expression in RegExr and it matches "JUPITER" which is exactly what I want returned. Thank you so much for your help!
我有一个保存到磁盘的 HTML 文件。我想使用 grep 搜索文件并输出特定 span 的内容,但我不知道这是否正确使用 grep。当我使用从另一个文件中读取的表达式对文件运行 grep 时(这样我就不会搞砸转义任何特殊字符),它不会输出任何内容。我已经测试了 RegExr 中的表达式,它匹配“JUPITER”,这正是我想要返回的。非常感谢你的帮助!
Desired Output
期望输出
JUPITER
回答by Paused until further notice.
Give this a try:
试试这个:
sed -n 's|^<span id="DInfo1_Municipality">\([^<]*\)</span></TD>$||p' file
or with GNU grepand your regex:
或使用 GNUgrep和您的正则表达式:
grep -Po '(?<=<span id="DInfo1_Municipality">)([^</span>]*)'
回答by Paul Creasey
Grep doesn't support that type of regex (lookbehind assertions), and its a very poor tool for this, but for the example given it is workable, will break under many situtions.
Grep 不支持这种类型的正则表达式(后视断言),它是一个非常糟糕的工具,但对于给出的例子来说它是可行的,在许多情况下都会中断。
grep -io "<span id=\"DInfo1_Municipality\">.*</span>" file.htlm | grep -io ">[^<]*" | grep -io [^>]*
something crazy like that, not a good idea.
像那样疯狂的事情,不是一个好主意。
回答by ghostdog74
sed -n '/DInfo1_Municipality/s/<\/span.*//p' file | sed 's/.*>//'

