PHP HTML DomDocument getElementById 问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3391942/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 09:34:29  来源:igfitidea点击:

PHP HTML DomDocument getElementById problems

phphtmlparsing

提问by Jé Queue

A little new to PHP parsing here, but I can't seem to get PHP's DomDocument to return what is clearly an identifiable node. The HTML loaded will come from the 'net so can't necessarily guarantee XML compliance, but I try the following:

这里对 PHP 解析有点新,但我似乎无法让 PHP 的 DomDocument 返回明显可识别的节点。加载的 HTML 将来自“网络”,因此不一定保证符合 XML,但我尝试以下操作:

<?php
header("Content-Type: text/plain");

$html = '<html><body>Hello <b id="bid">World</b>.</body></html>';

$dom = new DomDocument;
$dom->preserveWhiteSpace = false;
$dom->validateOnParse = true;

/*** load the html into the object ***/
$dom->loadHTML($html);
var_dump($dom);    

$belement = $dom->getElementById("bid");
var_dump($belement);

?>

Though I receive no error, I only receive the following as output:

虽然我没有收到任何错误,但我只收到以下输出:

object(DOMDocument)#1 (0) {
}
NULL

Should I not be able to look up the <b>tag as it does indeed have an id?

我是否应该无法查找<b>标签,因为它确实有一个 id?

回答by Wrikken

The Manualexplains why:

手册解释了原因:

For this function to work, you will need either to set some ID attributes with DOMElement->setIdAttribute() or a DTD which defines an attribute to be of type ID. In the later case, you will need to validate your document with DOMDocument->validate() or DOMDocument->validateOnParse before using this function.

要使此函数工作,您需要使用 DOMElement->setIdAttribute() 或 DTD 设置一些 ID 属性,该 DTD 将属性定义为 ID 类型。在后一种情况下,在使用此函数之前,您需要使用 DOMDocument->validate() 或 DOMDocument->validateOnParse 验证您的文档。

By all means, go for valid HTML & provide a DTD.

无论如何,选择有效的 HTML 并提供 DTD。

Quick fixes:

快速修复:

  1. Call $dom->validate();and put up with the errors (or fix them), afterwards you can use $dom->getElementById(), regardless of the errors for some reason.
  2. Use XPath if you don't feel like validing: $x = new DOMXPath($dom); $el = $x->query("//*[@id='bid']")->item(0);
  3. Come to think of it: if you just set validateOnParseto true beforeloading the HTML, if would also work ;P
  1. 调用$dom->validate();并忍受错误(或修复它们),之后您可以使用$dom->getElementById(),无论出于某种原因的错误如何。
  2. 如果您不想验证,请使用 XPath: $x = new DOMXPath($dom); $el = $x->query("//*[@id='bid']")->item(0);
  3. 想想看:如果你加载 HTML之前设置validateOnParse为 true ,如果也可以工作;P

.

.

$dom = new DOMDocument();
$html ='<html>
<body>Hello <b id="bid">World</b>.</body>
</html>';
$dom->validateOnParse = true; //<!-- this first
$dom->loadHTML($html);        //'cause 'load' == 'parse

$dom->preserveWhiteSpace = false;

$belement = $dom->getElementById("bid");
echo $belement->nodeValue;

Outputs 'World' here.

在此处输出“世界”。

回答by Martin Vseticka

Well, you should check if $dom->loadHTML($html);returns true (success) and I would try

好吧,你应该检查是否$dom->loadHTML($html);返回 true(成功),我会尝试

 var_dump($belement->nodeValue);

for output to get a clue what might be wrong.

输出以获得可能出错的线索。

EDIT:http://www.php-editors.com/php_manual/function.domdocument-get-element-by-id.html- it seems that DomDocument uses XPath internally.

编辑:http://www.php-editors.com/php_manual/function.domdocument-get-element-by-id.html - DomDocument 似乎在内部使用 XPath。

Example:

例子:

$xpath = xpath_new_context($dom);
var_dump(xpath_eval_expression($xpath, "//*[@ID = 'YOURIDGOESHERE']"));