PHP 解析 HTML 代码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3627489/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 10:28:39  来源:igfitidea点击:

PHP Parse HTML code

phphtmlparsing

提问by Francisc

Possible Duplicate:
Best methods to parse HTML

可能的重复:
解析 HTML 的最佳方法

How can I parse HTML code held in a PHP variable if it something like:

如果类似于以下内容,我如何解析保存在 PHP 变量中的 HTML 代码:

<h1>T1</h1>Lorem ipsum.<h1>T2</h1>The quick red fox...<h1>T3</h1>... jumps over the lazy brown FROG!

I want to only get the text that's between the headingsand I understand that it's not a good idea to use Regular Expressions.

我只想获取标题之间的文本,我知道使用正则表达式不是一个好主意。

回答by shamittomar

Use PHP Document Object Model:

使用 PHP文档对象模型

<?php
   $str = '<h1>T1</h1>Lorem ipsum.<h1>T2</h1>The quick red fox...<h1>T3</h1>... jumps over the lazy brown FROG';
   $DOM = new DOMDocument;
   $DOM->loadHTML($str);

   //get all H1
   $items = $DOM->getElementsByTagName('h1');

   //display all H1 text
   for ($i = 0; $i < $items->length; $i++)
        echo $items->item($i)->nodeValue . "<br/>";
?>

This outputs as:

这输出为:

 T1
 T2
 T3


[EDIT]: After OP Clarification:

[编辑]:在 OP 澄清后:

If you want the content like Lorem ipsum.etc, you can directly use this regex:

如果你想要像Lorem ipsum这样的内容等等,你可以直接使用这个正则表达式:

<?php
   $str = '<h1>T1</h1>Lorem ipsum.<h1>T2</h1>The quick red fox...<h1>T3</h1>... jumps over the lazy brown FROG';
   echo preg_replace("#<h1.*?>.*?</h1>#", "", $str);
?>

this outputs:

这输出:

Lorem ipsum.The quick red fox...... jumps over the lazy brown FROG

Lorem ipsum.The quick red fox......跳过懒惰的棕色青蛙