php 如何使用dom php解析器

Question

提问by chris

I'm new to DOM parsing in PHP:
I have a HTML file that I'm trying to parse. It has a bunch of DIVs like this:

我是 PHP 中 DOM 解析的新手：
我有一个要解析的 HTML 文件。它有一堆像这样的 DIV：

<div id="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div id="interestingbox"> 
......

I'm trying to get the contents of the many div boxes using php. How can I use the DOM parser to do this?

我正在尝试使用 php 获取许多 div 框的内容。我怎样才能使用 DOM 解析器来做到这一点？

Thanks!

谢谢！

Answer 1

回答by apelliciari

First i have to tell you that you can't use the same id on two different divs; there are classes for that point. Every element should have an unique id.

首先我必须告诉你，你不能在两个不同的 div 上使用相同的 id；有针对这一点的课程。每个元素都应该有一个唯一的 id。

Code to get the contents of the div with id="interestingbox"

获取 id="interestingbox" div 内容的代码

$html = '
<html>
<head></head>
<body>
<div id="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div id="interestingbox2"><a href="#">a link</a></div>
</body>
</html>';


$dom_document = new DOMDocument();

$dom_document->loadHTML($html);

//use DOMXpath to navigate the html with the DOM
$dom_xpath = new DOMXpath($dom_document);

// if you want to get the div with id=interestingbox
$elements = $dom_xpath->query("*/div[@id='interestingbox']");

if (!is_null($elements)) {

  foreach ($elements as $element) {
    echo "\n[". $element->nodeName. "]";

    $nodes = $element->childNodes;
    foreach ($nodes as $node) {
      echo $node->nodeValue. "\n";
    }

  }
}

//OUTPUT
[div]  {
        Content1
        Content2
}

Example with classes:

类示例：

$html = '
<html>
<head></head>
<body>
<div class="interestingbox"> 
   <div id="interestingdetails" class="txtnormal">
        <div>Content1</div>
        <div>Content2</div>
   </div>
</div>

<div class="interestingbox"><a href="#">a link</a></div>
</body>
</html>';

//the same as before.. just change the xpath

[...]

$elements = $dom_xpath->query("*/div[@class='interestingbox']");

[...]

//OUTPUT
[div]  {
        Content1
        Content2
}

[div]  {
a link
}

Refer to the DOMXPathpage for more details.

有关更多详细信息，请参阅DOMXPath页面。

Answer 2

回答by chris

I got this to work using simplehtmldomas a start:

我使用simplehtmldom作为开始让它工作：

$html = file_get_html('example.com');
foreach ($html->find('div[id=interestingbox]') as $result)
{
    echo $result->innertext;
}

Answer 3

回答by giorgio79

Very nice function from http://www.sitepoint.com/forums/showthread.php?611393-php5-need-something-like-innerHTML-instead-of-nodeValue

来自http://www.sitepoint.com/forums/showthread.php?611393-php5-need-something-like-innerHTML-instead-of-nodeValue 的非常好的功能

function innerXML($node) 

{ 

    $doc  = $node->ownerDocument; 

    $frag = $doc->createDocumentFragment(); 

    foreach ($node->childNodes as $child) 

    { 

        $frag->appendChild($child->cloneNode(TRUE)); 

    } 

    return $doc->saveXML($frag); 

}  


$dom = new DOMDocument(); 

$dom->loadXML(' 

<html> 

<body> 

<table> 

<tr> 

    <td id="foo">  

        The first bit of Data I want 

        <br />The second bit of Data I want 

        <br />The third bit of Data I want 

    </td> 

</tr> 

</table> 

<body> 

<html> 



'); 

$xpath = new DOMXPath($dom); 

$node = $xpath->evaluate("/html/body//td[@id='foo' ]"); 

$dataString = innerXML($node->item(0)); 
$dataArr = explode("<br />", $dataString); 

$dataUno = $dataArr[0]; 
$dataDos = $dataArr[1]; 
$dataTres = $dataArr[2]; 

echo "firstdata = $nameUno<br />seconddata = $nameDos<br />thirddata = $nameTres<br />"

Answer 4

回答by Oleksandr Knyga

WebExtractor: https://github.com/knyga/webextractorIt can parse page with css, regex, xpath selectors.

WebExtractor：https: //github.com/knyga/webextractor它可以使用 css、regex、xpath 选择器解析页面。

Look package and tests for examples:

查看包和测试示例：

use WebExtractor\DataExtractor\DataExtractorFactory; use WebExtractor\DataExtractor\DataExtractorTypes; use WebExtractor\Client\Client;
$factory = DataExtractorFactory::getFactory(); $extractor = $factory->createDataExtractor(DataExtractorTypes::CSS); $client = new Client; $content = $client->get('https://en.wikipedia.org/wiki/2014_Winter_Olympics'); $extractor->setContent($content); $h1 = $extractor->setSelector('h1')->extract();

使用 WebExtractor\DataExtractor\DataExtractorFactory; 使用 WebExtractor\DataExtractor\DataExtractorTypes; 使用 WebExtractor\Client\Client;
$factory = DataExtractorFactory::getFactory(); $extractor = $factory->createDataExtractor(DataExtractorTypes::CSS); $client = 新客户；$content = $client->get(' https://en.wikipedia.org/wiki/2014_Winter_Olympics'); $extractor->setContent($content); $h1 = $extractor->setSelector('h1')->extract();

php 如何使用dom php解析器

提问by chris

回答by apelliciari

回答by chris

回答by giorgio79

回答by Oleksandr Knyga

相关推荐

最近更新

标签

php 如何使用dom php解析器

提问by chris

回答by apelliciari

回答by chris

回答by giorgio79

回答by Oleksandr Knyga

相关推荐

PHP 会话超时脚本

php 我从哪里开始使用 Zend 框架？

php 为什么 TCPDF 忽略我的内联 CSS？

php PHP中目录结构的深度递归数组

相关推荐

最近更新

标签