html5 标签上的 PHP DOMDocument 错误/警告

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6090667/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 23:19:34  来源:igfitidea点击:

PHP DOMDocument errors/warnings on html5-tags

phphtmldomdocument

提问by Klaas Sangers

I've been attempting to parse HTML5-code so I can set attributes/values within the code, but it seems DOMDocument(PHP5.3) doesn't support tags like <nav>and <section>.

我一直在尝试解析 HTML5 代码,以便我可以在代码中设置属性/值,但似乎 DOMDocument(PHP5.3) 不支持<nav><section>.

Is there any way to parse this as HTML in PHP and manipulate the code?

有什么方法可以将其解析为 PHP 中的 HTML 并操作代码?



Code to reproduce:

重现代码:

<?php
$dom = new DOMDocument();
$dom->loadHTML("<!DOCTYPE HTML>
<html><head><title>test</title></head>
<body>
<nav>
  <ul>
    <li>first
    <li>second
  </ul>
</nav>
<section>
  ...
</section>
</body>
</html>");


Error

错误

Warning: DOMDocument::loadHTML(): Tag nav invalid in Entity, line: 4 in /home/wbkrnl/public_html/new-mvc/1.php on line 17

Warning: DOMDocument::loadHTML(): Tag section invalid in Entity, line: 10 in /home/wbkrnl/public_html/new-mvc/1.php on line 17

警告: DOMDocument::loadHTML(): Tag nav in invalid in Entity, line: 4 in /home/wbkrnl/public_html/new-mvc/1.php on line 17

警告:DOMDocument::loadHTML():标签部分在实体中无效,第 10 行在 /home/wbkrnl/public_html/new-mvc/1.php 第 17 行

回答by lonesomeday

No, there is no way of specifying a particular doctype to use, or to modify the requirements of the existing one.

不,没有办法指定要使用的特定文档类型,或修改现有文档类型的要求。

Your best workable solution is going to be to disable error reporting with libxml_use_internal_errors:

您最好的可行解决方案是禁用错误报告libxml_use_internal_errors

$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML('...');
libxml_clear_errors();

回答by Ilker Mutlu

You could also do

你也可以这样做

@$dom->loadHTML($htmlString);

回答by halfer

You can filter the errors you get from the parser. As per other answers here, turn off error reporting to the screen, and then iterate through the errors and only show the ones you want:

您可以过滤从解析器获得的错误。根据此处的其他答案,关闭向屏幕报告错误,然后遍历错误并仅显示您想要的错误:

libxml_use_internal_errors(TRUE);
// Do your load here
$errors = libxml_get_errors();

foreach ($errors as $error)
{
    /* @var $error LibXMLError */
}

Here is a print_r()of a single error:

这是一个print_r()单一的错误:

LibXMLError Object
(
    [level] => 2
    [code] => 801
    [column] => 17
    [message] => Tag section invalid

    [file] => 
    [line] => 39
)

By matching on the messageand/or the code, these can be filtered out quite easily.

通过匹配message和/或code,这些可以很容易地被过滤掉。

回答by user2782001

There doesn't seem to be a way to kill warnings but not errors. PHP has constants that are supposed to do this, but they don't seem to work. Here is what is SHOULD work, but doesn't because (bug?)....

似乎没有办法消除警告但没有消除错误。PHP 具有应该执行此操作的常量,但它们似乎不起作用。这是应该工作的内容,但不是因为(错误?)...

 $doc=new DOMDocument();
 $doc->loadHTML("<tagthatdoesnotexist><h1>Hi</h1></tagthatdoesnotexist>", LIBXML_NOWARNING );
 echo $doc->saveHTML();

http://php.net/manual/en/libxml.constants.php

http://php.net/manual/en/libxml.constants.php

回答by Emiliano Sangoi

This worked for me:

这对我有用:

$html = file_get_contents($url);

$search = array("<header>", "</header>", "<nav>", "</nav>", "<section>", "</section>");
$replace = array("<div>", "</div>","<div>", "</div>", "<div>", "</div>");
$html = str_replace($search, $replace, $html);

$dom = new DOMDocument();
$dom->loadHTML($html);

If you need the header tag, change the header with a div tag and use an id. For instance:

如果您需要标题标记,请使用 div 标记更改标题并使用 id。例如:

$search = array("<header>", "</header>");
$replace = array("<div id='header1'>", "</div>");

It's not the best solution but depending on the situation it can be useful.

这不是最好的解决方案,但根据情况它可能有用。

Good luck.

祝你好运。

回答by Sergey Kaluzhsky

HTML5 tags almost always use attributes such as id, class and so on. So the code for replacing will be:

HTML5 标签几乎总是使用 id、class 等属性。所以替换的代码将是:

$html = file_get_contents($url);
$search = array(
    "<header", "</header>", 
    "<nav", "</nav>", 
    "<section", "</section>",
    "<article", "</article>",
    "<footer", "</footer>",
    "<aside", "</aside>",
    "<noindex", "</noindex>",
);
$replace = array(
    "<div", "</div>",
    "<div", "</div>", 
    "<div", "</div>",
    "<div", "</div>",
    "<div", "</div>",
    "<div", "</div>",
    "<div", "</div>",
);
$html = str_replace($search, $replace, $html);
$dom = new DOMDocument();
$dom->loadHTML($html);