PHP 解析 HTML 标签
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13458133/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PHP parse HTML tags
提问by Troy McClure
Possible Duplicate:
How to parse and process HTML with PHP?
可能的重复:
如何使用 PHP 解析和处理 HTML?
I'm pretty new to PHP. I have the text of a body tag of some page in a string variable. I'd like to know if it contains some tag ... where the tag name tag1 is given, and if so, take only that tag from the string. How can I do that simply in PHP?
我对 PHP 很陌生。我在字符串变量中有某个页面的正文标记的文本。我想知道它是否包含一些标签......其中给出了标签名称 tag1,如果是,则只从字符串中获取该标签。我怎么能简单地在 PHP 中做到这一点?
Thanks!!
谢谢!!
回答by RTB
You would be looking at something like this:
你会看到这样的事情:
<?php
$content = "";
$doc = new DOMDocument();
$doc->load("example.html");
$items = $doc->getElementsByTagName('tag1');
if(count($items) > 0) //Only if tag1 items are found
{
foreach ($items as $tag1)
{
// Do something with $tag1->nodeValue and save your modifications
$content .= $tag1->nodeValue;
}
}
else
{
$content = $doc->saveHTML();
}
echo $content;
?>
DomDocumentrepresents an entire HTML or XML document; serves as the root of the document tree. So you will have a valid markup, and by finding elements By Tag Name you won't find comments.
DomDocument代表整个 HTML 或 XML 文档;作为文档树的根。因此,您将拥有一个有效的标记,并且通过按标签名称查找元素,您将找不到注释。
回答by AmShaegar
Another possibility is regex.
另一种可能性是正则表达式。
$matches = null;
$returnValue = preg_match_all('#<li.*?>(.*?)</li>#', 'abc', $matches);
$matches[0][x]contains the whole matches such as <li class="small">list entry</li>, $matches[1][x]containt the inner HTML only such as list entry.
$matches[0][x]包含整个匹配项,例如<li class="small">list entry</li>,$matches[1][x]仅包含内部 HTML,例如list entry。
回答by Andrei Cristian Prodan
Fast way:
快捷方式:
Look for the index position of tag1 then look for the index position of /tag1. Then cut the string between those two indexes. Look up strpos and substr on php.net Also this might not work if your string is too long.
查找tag1的索引位置,然后查找/tag1的索引位置。然后剪切这两个索引之间的字符串。在 php.net 上查找 strpos 和 substr 如果您的字符串太长,这也可能不起作用。
$pos1 = strpos($bigString, '<tag1>');
$pos2 = strpos($bigString, '</tag1>');
$resultingString = substr($bigString, -$pos1, $pos2);
You might have to add and/or substract some units from $pos1 and $pos2 to get the $resultingString right. (if you don't have comments with tag1 inside of them sigh)
您可能需要在 $pos1 和 $pos2 中添加和/或减去一些单位才能获得正确的 $resultingString。(如果您没有在其中包含 tag1 的评论,请叹气)
The right way:
正确的方式:
Look up html parsers
查找 html 解析器

