php 获取 HTML 标签之间的文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5699911/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 22:16:22  来源:igfitidea点击:

Get text between HTML tags

phphtmlarraysstringpreg-match

提问by Ryan Cooper

Ok, This is a pretty basic question im sure but im new to PHP and haven't been able to figure it out. The input string is $data im trying to continue to pull and only use the first match. Is the below incorrect? This may not even be the best way to perform the action, im just trying to pull the contents in between two html tags (first set found) and discard the rest of the data. I know there are similar questions, ive read them all, my question is a mix, if theres a better way to do this and how i can define the match as the new input for the rest of the remaining code. If i change $matches to $data2 and use it from there on out it returns errors.

好的,我确定这是一个非常基本的问题,但我是 PHP 新手并且无法弄清楚。输入字符串是 $data 我试图继续拉取并且只使用第一个匹配项。下面不正确吗?这甚至可能不是执行操作的最佳方式,我只是试图将内容拉入两个 html 标签(找到的第一组)并丢弃其余数据。我知道有类似的问题,我都阅读了它们,我的问题是一个混合问题,如果有更好的方法来做到这一点,以及我如何将匹配定义为其余代码的新输入。如果我将 $matches 更改为 $data2 并从那里使用它,它会返回错误。

preg_match('/<h2>(.*?)<\/h2>/s', $data, $matches);

回答by diEcho

Don't parse HTML via preg_match, use this PHP class instead:

不要通过 解析 HTML preg_match,而是使用这个 PHP 类:

The DOMDocument class

DOMDocument 类

Example:

例子:

<?php 

$html= "<p>hi</p>
<h1>H1 title</h1>
<h2>H2 title</h2>
<h3>H2 title</h3>";
 // a new dom object 
 $dom = new domDocument('1.0', 'utf-8'); 
 // load the html into the object 
 $dom->loadHTML($html); 
 //discard white space 
 $dom->preserveWhiteSpace = false; 
 $hTwo= $dom->getElementsByTagName('h2'); // here u use your desired tag
 echo $hTwo->item(0)->nodeValue; 
 //will return "H2 title";
 ?>

Reference

参考

回答by Erik

Using regular expressions is generally a good idea for your problem.

对于您的问题,使用正则表达式通常是一个好主意。

When you look at http://php.net/preg_matchyou see that $matches will be an array, since there may be more than one match. Try

当您查看http://php.net/preg_match 时,您会看到 $matches 将是一个数组,因为可能有多个匹配项。尝试

print_r($matches);

to get an idea of how the result looks, and then pick the right index.

了解结果的外观,然后选择正确的索引。

EDIT:

编辑:

If there is a match, then you can get the text extracted between the parenthesis-group with

如果有匹配,那么您可以使用括号组之间提取的文本

print($matches[1]);

If you had more than one parenthesis-group they would be numbered 2, 3 etc. You should also consider the case when there is no match, in which case the array will have the size of 0.

如果您有多个括号组,它们将被编号为 2、3 等。您还应该考虑没有匹配的情况,在这种情况下,数组的大小为 0。

回答by ErickBest

You could do it this way::

你可以这样做:

$h1 = preg_replace('/<h1[^>]*?>([\s\S]*?)<\/h1>/',
'\1', $h1);

This will Strip off or unwrap the TEXT from the <H1></H1>HTML Tags

这将从<H1></H1>HTML 标签中剥离或解开文本