使用 PHP DOM 文档,按类选择 HTML 元素并获取其文本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18182857/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 17:06:59  来源:igfitidea点击:

Using PHP DOM document, to select HTML element by its class and get its text

phphtmldomdocument

提问by Abhishek Madhani

I trying to get text from div where class = 'review-text', by using PHP's DOM element with following HTML (same structure) and following code.

我试图通过使用带有以下 HTML(相同结构)和以下代码的 PHP DOM 元素从 div where class = 'review-text'获取文本。

However this doesn't seem to work

然而这似乎不起作用

  1. HTML

    $html = '
        <div class="page-wrapper">
            <section class="page single-review" itemtype="http://schema.org/Review" itemscope="" itemprop="review">
                <article class="review clearfix">
                    <div class="review-content">
                        <div class="review-text" itemprop="reviewBody">
                        Outstanding ... 
                        </div>
                    </div>
                </article>
            </section>
        </div>
    ';
    
  2. PHP Code

        $classname = 'review-text';
        $dom = new DOMDocument;
        $dom->loadHTML($html);
        $xpath     = new DOMXPath($dom);
        $results = $xpath->query("//*[@class and contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
    
        if ($results->length > 0) {
            echo $review = $results->item(0)->nodeValue;
        }
    
  1. HTML

    $html = '
        <div class="page-wrapper">
            <section class="page single-review" itemtype="http://schema.org/Review" itemscope="" itemprop="review">
                <article class="review clearfix">
                    <div class="review-content">
                        <div class="review-text" itemprop="reviewBody">
                        Outstanding ... 
                        </div>
                    </div>
                </article>
            </section>
        </div>
    ';
    
  2. PHP代码

        $classname = 'review-text';
        $dom = new DOMDocument;
        $dom->loadHTML($html);
        $xpath     = new DOMXPath($dom);
        $results = $xpath->query("//*[@class and contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
    
        if ($results->length > 0) {
            echo $review = $results->item(0)->nodeValue;
        }
    

The XPATH syntax to select element by Class is provided at this Blog

博客提供了按类选择元素的 XPATH 语法

I have tried many example from StackOverflow, online tutorials, but none seems to work. Am I missing something ?

我已经尝试了 StackOverflow 在线教程中的许多示例,但似乎都没有工作。我错过了什么吗?

回答by Frank Houweling

The following XPath query does what you want. Just replace the argument provided to $xpath->query with the following:

以下 XPath 查询执行您想要的操作。只需将提供给 $xpath->query 的参数替换为以下内容:

//div[@class="review-text"]

Edit: For easy development, you can test your own XPath query's online at http://www.xpathtester.com/test.

编辑:为了便于开发,您可以在http://www.xpathtester.com/test在线测试您自己的 XPath 查询。

Edit2: Tested this code; it worked perfectly.

Edit2:测试了这段代码;它工作得很好。

<?php

$html = '
    <div class="page-wrapper">
        <section class="page single-review" itemtype="http://schema.org/Review" itemscope="" itemprop="review">
            <article class="review clearfix">
                <div class="review-content">
                    <div class="review-text" itemprop="reviewBody">
                    Outstanding ... 
                    </div>
                </div>
            </article>
        </section>
    </div>
';

$classname = 'review-text';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[@class='" . $classname . "']");

if ($results->length > 0) {
    echo $review = $results->item(0)->nodeValue;
}

?>

回答by razz

Expanding on Frak Houwelinganswer, it is also possible to use DomXpathto search within a specific DomNode. This can be acheived by passing the contextNodeas a second argument to DomXpath->querymethod:

扩展Frak Houweling 的答案,还可以使用DomXpath在特定的DomNode 中进行搜索。这可以通过将 传递contextNodeDomXpath->query方法的第二个参数来实现:

$dom = new DOMDocument;
$dom->loadHTML ($html);
$xpath = new DOMXPath ($dom);

foreach ($xpath->query ("//section[@class='page single-review']") as $section)
{
    // search for sub nodes inside each element
    foreach ($xpath->query (".//div[@class='review-text']", $section) as $review)
    {
        echo $review->nodeValue;
    }
}

Note that when searching inside nodes you need to use relative paths by adding a dot .at the beginning of the expression:

请注意,在节点内部搜索时,您需要通过.在表达式开头添加一个点来使用相对路径:

"//div[@class='review-text']" // absolute path, search starts from the root element
".//div[@class='review-text']" // relative path, search starts from the provided contextNode