PHP 解析 HTML 代码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3627489/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PHP Parse HTML code
提问by Francisc
Possible Duplicate:
Best methods to parse HTML
可能的重复:
解析 HTML 的最佳方法
How can I parse HTML code held in a PHP variable if it something like:
如果类似于以下内容,我如何解析保存在 PHP 变量中的 HTML 代码:
<h1>T1</h1>Lorem ipsum.<h1>T2</h1>The quick red fox...<h1>T3</h1>... jumps over the lazy brown FROG!
I want to only get the text that's between the headingsand I understand that it's not a good idea to use Regular Expressions.
我只想获取标题之间的文本,我知道使用正则表达式不是一个好主意。
回答by shamittomar
Use PHP Document Object Model:
使用 PHP文档对象模型:
<?php
$str = '<h1>T1</h1>Lorem ipsum.<h1>T2</h1>The quick red fox...<h1>T3</h1>... jumps over the lazy brown FROG';
$DOM = new DOMDocument;
$DOM->loadHTML($str);
//get all H1
$items = $DOM->getElementsByTagName('h1');
//display all H1 text
for ($i = 0; $i < $items->length; $i++)
echo $items->item($i)->nodeValue . "<br/>";
?>
This outputs as:
这输出为:
T1
T2
T3
[EDIT]: After OP Clarification:
[编辑]:在 OP 澄清后:
If you want the content like Lorem ipsum.etc, you can directly use this regex:
如果你想要像Lorem ipsum这样的内容。等等,你可以直接使用这个正则表达式:
<?php
$str = '<h1>T1</h1>Lorem ipsum.<h1>T2</h1>The quick red fox...<h1>T3</h1>... jumps over the lazy brown FROG';
echo preg_replace("#<h1.*?>.*?</h1>#", "", $str);
?>
this outputs:
这输出:
Lorem ipsum.The quick red fox...... jumps over the lazy brown FROG
Lorem ipsum.The quick red fox......跳过懒惰的棕色青蛙