php 提取标签之间文本的正则表达式,但不提取标签
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15033905/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regex that extracts text between tags, but not the tags
提问by Nicolaesse
I want to write a regex which extract the content that is between two tags <title>in a string but not the tags.
IE I have the following
我想编写一个正则表达式,它提取<title>字符串中两个标签之间的内容,而不是标签。IE 我有以下内容
<title>My work</title>
<p>This is my work.</p> <p>Learning regex.</p>
The regex
正则表达式
(<title>)(.*?)(<\/title>)
extracts <title>My work</title>but I want to extract only My work. How can I do that?
This is a link to the example http://regex101.com/r/mD8fB0
提取<title>My work</title>但我只想提取My work. 我怎样才能做到这一点?这是示例的链接http://regex101.com/r/mD8fB0
回答by shasan
You can use this following Regex:
您可以使用以下正则表达式:
>([^<]*)<
or, >[^<]*<
或者,>[^<]*<
Then eliminate unwanted characters like '<' & '>'
然后消除不需要的字符,如 '<' & '>'
回答by Ammar
回答by Mike Brant
In your case, you could just use the second backreference from the regex, which would hold the text you are interested in.
在您的情况下,您可以使用正则表达式中的第二个反向引用,它将保存您感兴趣的文本。
Since you mention preg_matchin your tags, I am assuming you want this for PHP.
既然你preg_match在标签中提到了,我假设你想要这个用于 PHP。
$matches = array();
$pattern = '#<title>(.*?)</title>#'; // note I changed the pattern a bit
preg_match($pattern, $string, $matches);
$title = $matches[1];
Note that this is actually the first back reference in my patterns, since I've omitted the parentheses around the tags themselves, which were not needed.
请注意,这实际上是我的模式中的第一个反向引用,因为我省略了标签本身周围的括号,这是不需要的。
Typically, you should not use Regex to parse HTML documents, but I think this might be one of those exception cases, where it is not so bad, since the title tag should only exist once on the page.
通常,您不应该使用 Regex 来解析 HTML 文档,但我认为这可能是那些例外情况之一,它并不是那么糟糕,因为标题标签应该只在页面上存在一次。
回答by andrewster
I used this as a replace function of Regex: (<.+?>)
我用它作为 Regex 的替换函数:(<.+?>)

