PHP“漂亮的打印”HTML(不整洁)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/768215/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-24 23:47:40  来源:igfitidea点击:

PHP "pretty print" HTML (not Tidy)

phphtmlformattidy

提问by Hyman Sleight

I'm using the DOM extension in PHP to build some HTML documents, and I want the output to be formatted nicely (with new lines and indentation) so that it's readable, however, from the many tests I've done:

我在 PHP 中使用 DOM 扩展来构建一些 HTML 文档,并且我希望输出的格式很好(使用新行和缩进),以便它是可读的,但是,从我所做的许多测试来看:

  1. "formatOutput = true" doesn't work at all with saveHTML(), only saveXML()
  2. Even if I used saveXML(), it still only works on elements created via the DOM, not elements that are included with loadHTML(), even with "preserveWhiteSpace = false"
  1. "formatOutput = true" 根本不适用于 saveHTML(),只适用于 saveXML()
  2. 即使我使用了 saveXML(),它仍然只适用于通过 DOM 创建的元素,而不适用于 loadHTML() 包含的元素,即使使用“preserveWhiteSpace = false”

If anyone knows differently I'd really like to know how they got it to work.

如果有人有不同的了解,我真的很想知道他们是如何让它发挥作用的。

So, I have a DOM document, and I'm using saveHTML() to output the HTML. As it's coming from the DOM I know it is valid, there's no need to "Tidy" or validate it in any way.

所以,我有一个 DOM 文档,我使用 saveHTML() 来输出 HTML。由于它来自 DOM,我知道它是有效的,因此无需“整理”或以任何方式验证它。

I'm simply looking for a way to get nicely formatted output from the output I receive from the DOM extension.

我只是在寻找一种方法来从我从 DOM 扩展收到的输出中获得格式良好的输出。

NB. As you may have guessed, I don't want to use the Tidy extension as a) it does a lot more that I need it too (the markup is already valid) and b) it actually makes changes to the HTML content (such as the HTML 5 doctype and some elements).

注意。正如您可能已经猜到的那样,我不想使用 Tidy 扩展作为 a) 它做了很多我也需要它的事情(标记已经有效)并且 b) 它实际上对 HTML 内容进行了更改(例如HTML 5 文档类型和一些元素)。

Follow Up:

跟进:

OK, with the help of the answer below I've worked out why the DOM extension wasn't working. Although the given example works, it still wasn't working with my code. With the help of thiscomment I found that if you have any text nodes where isWhitespaceInElementContent() is true no formatting will be applied beyond that point. This happens regardless of whether or not preserveWhiteSpace is false. The solution is to remove all of these nodes (although I'm not sure if this may have adverse effects on the actual content).

好的,在下面的答案的帮助下,我弄清楚了为什么 DOM 扩展不起作用。尽管给定的示例有效,但它仍然不适用于我的代码。在评论的帮助下,我发现如果您有任何 isWhitespaceInElementContent() 为 true 的文本节点,则不会在该点之后应用任何格式。无论preserveWhiteSpace 是否为false,都会发生这种情况。解决方案是删除所有这些节点(虽然我不确定这是否会对实际内容产生不利影响)。

回答by stefs

you're right, there seems to be no indentation for HTML (others are also confused). XML works, even with loaded code.

你是对的,HTML 似乎没有缩进(其他人也很困惑)。XML 工作,即使加载的代码。

<?php
function tidyHTML($buffer) {
    // load our document into a DOM object
    $dom = new DOMDocument();
    // we want nice output
    $dom->preserveWhiteSpace = false;
    $dom->loadHTML($buffer);
    $dom->formatOutput = true;
    return($dom->saveHTML());
}

// start output buffering, using our nice
// callback function to format the output.
ob_start("tidyHTML");

?>
<html>
    <head>
    <title>foo bar</title><meta name="bar" value="foo"><body><h1>bar foo</h1><p>It's like comparing apples to oranges.</p></body></html>
<?php
// this will be called implicitly, but we'll
// call it manually to illustrate the point.
ob_end_flush();
?>

result:

结果:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<title>foo bar</title>
<meta name="bar" value="foo">
</head>
<body>
<h1>bar foo</h1>
<p>It's like comparing apples to oranges.</p>
</body>
</html>

the same with saveXML() ...

与 saveXML() 相同...

<?xml version="1.0" standalone="yes"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
  <head>
    <title>foo bar</title>
    <meta name="bar" value="foo"/>
  </head>
  <body>
    <h1>bar foo</h1>
    <p>It's like comparing apples to oranges.</p>
  </body>
</html>

probably forgot to set preserveWhiteSpace=false before loadHTML?

可能忘记在 loadHTML 之前设置preserveWhiteSpace=false?

disclaimer: i stole most of the demo code from tyson clugg/php manual comments. lazy me.

免责声明:我从tyson clugg/php 手册注释中窃取了大部分演示代码。懒惰的我



UPDATE:i now remember some years ago i tried the same thing and ran into the same problem. i fixed this by applying a dirty workaround (wasn't performance critical): i just somehow converted around between SimpleXML and DOM until the problem vanished. i suppose the conversion got rid of those nodes. maybe load with dom, import with simplexml_import_dom, then output the string, parse this with DOM again and thenprinted it pretty. as far as i remember this worked (but it was reallyslow).

更新:我现在记得几年前我尝试过同样的事情并遇到了同样的问题。我通过应用一个肮脏的解决方法来解决这个问题(不是性能关键):我只是以某种方式在 SimpleXML 和 DOM 之间转换,直到问题消失。我想转换摆脱了这些节点。也许用 dom 加载,用 导入simplexml_import_dom,然后输出字符串,再次用 DOM 解析它,然后漂亮地打印出来。据我记得这是有效的(但它真的很慢)。

回答by Artis Zelmenis

The result:

结果:

<!DOCTYPE html>
<html>
    <head>
        <title>My website</title>
    </head>
</html>

Please consider:

请考虑:

function indentContent($content, $tab="\t"){
    $content = preg_replace('/(>)(<)(\/*)/', "\n", $content); // add marker linefeeds to aid the pretty-tokeniser (adds a linefeed between all tag-end boundaries)
    $token = strtok($content, "\n"); // now indent the tags
    $result = ''; // holds formatted version as it is built
    $pad = 0; // initial indent
    $matches = array(); // returns from preg_matches()
    // scan each line and adjust indent based on opening/closing tags
    while ($token !== false && strlen($token)>0){
        $padPrev = $padPrev ?: $pad; // previous padding //Artis
        $token = trim($token);
        // test for the various tag states
        if (preg_match('/.+<\/\w[^>]*>$/', $token, $matches)){// 1. open and closing tags on same line - no change
            $indent=0;
        }elseif(preg_match('/^<\/\w/', $token, $matches)){// 2. closing tag - outdent now
            $pad--;
            if($indent>0) $indent=0;
        }elseif(preg_match('/^<\w[^>]*[^\/]>.*$/', $token, $matches)){// 3. opening tag - don't pad this one, only subsequent tags (only if it isn't a void tag)
            foreach($matches as $m){
                if (preg_match('/^<(area|base|br|col|command|embed|hr|img|input|keygen|link|meta|param|source|track|wbr)/im', $m)){// Void elements according to http://www.htmlandcsswebdesign.com/articles/voidel.php
                    $voidTag=true;
                    break;
                }
            }
            $indent = 1;
        }else{// 4. no indentation needed
            $indent = 0;
        }


        $line = str_pad($token, strlen($token)+$pad, $tab, STR_PAD_LEFT);// pad the line with the required number of leading spaces
        $result .= $line."\n"; // add to the cumulative result, with linefeed
        $token = strtok("\n"); // get the next token
        $pad += $indent; // update the pad size for subsequent lines
        if($voidTag){
            $voidTag=false;
            $pad--;
        }
    }
    return $result;
}

//$htmldoc - DOMdocument Object!

$niceHTMLwithTABS = indentContent($htmldoc->saveHTML(), $tab="\t");

echo $niceHTMLwithTABS;

Will result in HTML that has:

将导致具有以下内容的 HTML:

  • Indentation based on "levels"
  • Line breaks after block level elements
  • While inline and self-closing elements are not affected
  • 基于“级别”的缩进
  • 块级元素后换行
  • 虽然内联和自关闭元素不受影响

The function (which is a method for class I use) is largely based on: https://stackoverflow.com/a/7840997/7646824

该函数(这是我使用的类的方法)主要基于:https: //stackoverflow.com/a/7840997/7646824

回答by user594694

You can use the code for the hl_tidyfunction of the htmLawedlibrary.

您可以使用htmLawed库的hl_tidy函数的代码。

// indent using one tab per indent, with all HTML being within an imaginary div
$out = hl_tidy($in, 't', 'div')