php 获取没有 DOCTYPE、HTML、HEAD 和 BODY 标签的 BODY 内容

Question

提问by enrico pax

What I am trying to do is include an HTML file within a PHP system (not a problem) but that HTML file also needs to be usable on its own, for various reasons, so I need to know how I can strip the doctype, html, head and body tags in the context of the PHP include, if that's possible.

我想要做的是在 PHP 系统中包含一个 HTML 文件（不是问题），但由于各种原因，该 HTML 文件也需要单独使用，所以我需要知道如何去除 doctype、html如果可能，PHP 上下文中的、head 和 body 标记包括在内。

I'm not particularly good at PHP (doh!) so my searches of the php manual and on the web hasn't made me figure this out. Meaning that any help or reading tips, or both, are much appreciated.

我不是特别擅长 PHP（doh！）所以我在 php 手册和网络上的搜索并没有让我弄清楚这一点。这意味着非常感谢任何帮助或阅读技巧，或两者兼而有之。

Answer 1

回答by Jared Farrish

Since the substr()method seemed to be too much for some to swallow, here is a DOM parser method:

由于该substr()方法对于某些人来说似乎太多了，因此这里是一个 DOM 解析器方法：

$d = new DOMDocument;
$mock = new DOMDocument;
$d->loadHTML(file_get_contents('/path/to/my.html'));
$body = $d->getElementsByTagName('body')->item(0);
foreach ($body->childNodes as $child){
    $mock->appendChild($mock->importNode($child, true));
}

echo $mock->saveHTML();

http://codepad.org/MQVQ3XQP

Anybody wish to see that "other one", see the revisions.

任何人都希望看到“另一个”，请参阅修订版。

Answer 2

回答by Patrick

$site = file_get_contents("http://www.google.com/");

preg_match("/<body[^>]*>(.*?)<\/body>/is", $site, $matches);

echo($matches[1]);

Answer 3

回答by Ja?ck

Use DOMDocument to keep what you need rather than strip what you don't need (PHP >= 5.3.6)

使用 DOMDocument 保留您需要的内容而不是删除您不需要的内容（PHP >= 5.3.6）

$d = new DOMDocument;
$d->loadHTMLFile($fileLocation);
$body = $d->getElementsByTagName('body')->item(0);
// perform innerhtml on $body by enumerating child nodes 
// and saving them individually
foreach ($body->childNodes as $childNode) {
  echo $d->saveHTML($childNode);
}

Answer 4

回答by tobyodavies

Use a DOM parser. this is not tested but ought to do what you want

使用 DOM 解析器。这没有经过测试，但应该做你想做的

$domDoc = new DOMDocument();
$domDoc.loadHTMLFile('/path/to/file');
$body = $domDoc->GetElementsByTagName('body')->item(0);
foreach ($body->childNodes as $child){
    echo $child->C14N(); //Note this cannonicalizes the representation of the node, but that's not necessarily a bad thing
}

If you want to avoid cannonicalization, you can use this version(thanks to @Jared Farrish)

如果你想避免规范化，你可以使用这个版本（感谢@Jared Farrish）

Answer 5

回答by lubosdz

You may want to use PHP tidy extension which can fix invalid XHTML structures (in which case DOMDocument load crashes) and also extract body only:

您可能想要使用 PHP tidy 扩展，它可以修复无效的 XHTML 结构（在这种情况下 DOMDocument 加载崩溃）并且还仅提取正文：

$tidy = new tidy();
$htmlBody = $tidy->repairString($html, array(
    'output-xhtml' => true,
    'show-body-only' => true,
), 'utf8');

Then load extracted body into DOMDocument:

然后将提取的主体加载到 DOMDocument 中：

$xml = new DOMDocument();
$xml->loadHTML($htmlBody);

Then traverse, extract, move around XML nodes etc .. and save:

然后遍历、提取、移动 XML 节点等......并保存：

$output = $xml->saveXML();

Answer 6

回答by Luca Vizzi

A solution with only one instance of DOMDocument and without loops

只有一个 DOMDocument 实例且没有循环的解决方案

$d = new DOMDocument();
$d->loadHTML(file_get_contents('/path/to/my.html'));
$body = $d->getElementsByTagName('body')->item(0);
echo $d->saveHTML($body);

Answer 7

回答by Luca Vizzi

This may be a solution. I tried it and it works fine.

这可能是一个解决方案。我试过了，效果很好。

function parseHTML(string) {
      var   parser = new DOMParser
     , result = parser.parseFromString(string, "text/html");
      return result.firstChild.lastChild.firstChild;
    }

php 获取没有 DOCTYPE、HTML、HEAD 和 BODY 标签的 BODY 内容

提问by enrico pax

回答by Jared Farrish

回答by Patrick

回答by Ja?ck

回答by tobyodavies

回答by lubosdz

回答by Luca Vizzi

回答by Luca Vizzi

相关推荐

最近更新

标签

php 获取没有 DOCTYPE、HTML、HEAD 和 BODY 标签的 BODY 内容

提问by enrico pax

回答by Jared Farrish

回答by Patrick

回答by Ja?ck

回答by tobyodavies

回答by lubosdz

回答by Luca Vizzi

回答by Luca Vizzi

相关推荐

php 如果未登录，Wordpress 会重定向用户

具有完整 jQuery 集成的 PHP 框架？

通过 PHP 在电子邮件中发送 HTML

PHP foreach 循环遍历多维数组

相关推荐

最近更新

标签