如何在 PHP 中解析 HTML？

Question

提问by laradev

I know we can use PHP DOMto parse HTML using PHP. I found a lot of questions here on Stack Overflow too. But I have a specific requirement. I have an HTML content like below

我知道我们可以使用PHP DOM来解析使用 PHP 的 HTML。我在 Stack Overflow 上也发现了很多问题。但我有一个特定的要求。我有一个像下面这样的 HTML 内容

<p class="Heading1-P">
    <span class="Heading1-H">Chapter 1</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 1</span>
</p>
<p class="Heading1-P">
    <span class="Heading1-H">Chapter 2</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 2</span>
</p>
<p class="Heading1-P">
    <span class="Heading1-H">Chapter 3</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 3</span>
</p>

I want to parse the above HTML and save the content into two different arrays like:

我想解析上面的 HTML 并将内容保存到两个不同的数组中，例如：

$headingand $content

$heading和 $content

$heading = array('Chapter 1','Chapter 2','Chapter 3');
$content = array('This is chapter 1','This is chapter 2','This is chapter 3');

I can achieve this simply using jQuery. But I am not sure, if that's the right way. It would be great if someone can point me to the right direction. Thanks in advance.

我可以简单地使用 jQuery 来实现这一点。但我不确定，这是否是正确的方法。如果有人能指出我正确的方向，那就太好了。提前致谢。

Answer 1

回答by saji89

I have used domdocument and domxpath to get the solution, you can find it at:

我已经使用 domdocument 和 domxpath 来获得解决方案，您可以在以下位置找到它：

<?php
$dom = new DomDocument();
$test='<p class="Heading1-P">
    <span class="Heading1-H">Chapter 1</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 1</span>
</p>
<p class="Heading1-P">
    <span class="Heading1-H">Chapter 2</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 2</span>
</p>
<p class="Heading1-P">
    <span class="Heading1-H">Chapter 3</span>
</p>
<p class="Normal-P">
    <span class="Normal-H">This is chapter 3</span>
</p>';

$dom->loadHTML($test);
$xpath = new DOMXpath($dom);
    $heading=parseToArray($xpath,'Heading1-H');
    $content=parseToArray($xpath,'Normal-H');

var_dump($heading);
echo "<br/>";
var_dump($content);
echo "<br/>";

function parseToArray($xpath,$class)
{
    $xpathquery="//span[@class='".$class."']";
    $elements = $xpath->query($xpathquery);

    if (!is_null($elements)) {  
        $resultarray=array();
        foreach ($elements as $element) {
            $nodes = $element->childNodes;
            foreach ($nodes as $node) {
              $resultarray[] = $node->nodeValue;
            }
        }
        return $resultarray;
    }
}

Live result:http://saji89.codepad.org/2TyOAibZ

实时结果：http : //saji89.codepad.org/2TyOAibZ

Answer 2

回答by Paul Denisevich

Try to look at PHP Simple HTML DOM Parser

试试看PHP Simple HTML DOM Parser

It has brilliant syntax similar to jQuery so you can easily select any element you want by ID or class

它具有类似于 jQuery 的出色语法，因此您可以通过 ID 或类轻松选择所需的任何元素

// include/require the simple html dom parser file

$html_string = '
    <p class="Heading1-P">
        <span class="Heading1-H">Chapter 1</span>
    </p>
    <p class="Normal-P">
        <span class="Normal-H">This is chapter 1</span>
    </p>
    <p class="Heading1-P">
        <span class="Heading1-H">Chapter 2</span>
    </p>
    <p class="Normal-P">
        <span class="Normal-H">This is chapter 2</span>
    </p>
    <p class="Heading1-P">
        <span class="Heading1-H">Chapter 3</span>
    </p>
    <p class="Normal-P">
        <span class="Normal-H">This is chapter 3</span>
    </p>';
$html = str_get_html($html_string);
foreach($html->find('span') as $element) {
    if ($element->class === 'Heading1-H') {
        $heading[] = $element->innertext;
    }else if($element->class === 'Normal-H') {
        $content[] = $element->innertext;
    }
}

Answer 3

回答by Greeso

One option for you is to use DOMDocument and DOMXPath. They do require a bit of a curve to learn, but once you do, you will be pretty happy with what you can achieve.

一种选择是使用 DOMDocument 和 DOMXPath。他们确实需要一些曲线来学习，但是一旦你这样做了，你就会对你能取得的成就感到非常满意。

Read the following in php.net

在 php.net 中阅读以下内容

http://php.net/manual/en/class.domdocument.php

http://php.net/manual/en/class.domxpath.php

Hope this helps.

希望这可以帮助。

Answer 4

回答by jfraber

// Create DOM from URL or file

// 从 URL 或文件创建 DOM

$html = file_get_html('http://www.google.com/');

// Find all images

// 查找所有图像

foreach($html->find('img') as $element) 
   echo $element->src . '<br>';

// Find all links

// 查找所有链接

foreach($html->find('a') as $element) 
   echo $element->href . '<br>';

如何在 PHP 中解析 HTML？

提问by laradev

回答by saji89

回答by Paul Denisevich

回答by Greeso

回答by jfraber

相关推荐

最近更新

标签

如何在 PHP 中解析 HTML？

提问by laradev

回答by saji89

回答by Paul Denisevich

回答by Greeso

回答by jfraber

相关推荐

登录成功后如何重定向？[PHP]

php Symfony2 - 如何在控制器中验证电子邮件地址

php 在 Yii 2 中格式化日期时间值

PHP 致命错误：第 30 行找不到“DOMPDF”类

相关推荐

最近更新

标签