如何使用 php 将 docx 文档转换为 html?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4587216/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 13:35:10  来源:igfitidea点击:

How can I convert a docx document to html using php?

phphtmldocx

提问by xun

I want to be able to upload an MS word document and export it a page in my site.

我希望能够上传 MS Word 文档并将其导出到我的站点中的一个页面。

Is there any way to accomplish this?

有什么办法可以做到这一点吗?

回答by David Lin

//FUNCTION :: read a docx file and return the string
function readDocx($filePath) {
    // Create new ZIP archive
    $zip = new ZipArchive;
    $dataFile = 'word/document.xml';
    // Open received archive file
    if (true === $zip->open($filePath)) {
        // If done, search for the data file in the archive
        if (($index = $zip->locateName($dataFile)) !== false) {
            // If found, read it to the string
            $data = $zip->getFromIndex($index);
            // Close archive file
            $zip->close();
            // Load XML from a string
            // Skip errors and warnings
            $xml = DOMDocument::loadXML($data, LIBXML_NOENT | LIBXML_XINCLUDE | LIBXML_NOERROR | LIBXML_NOWARNING);
            // Return data without XML formatting tags

            $contents = explode('\n',strip_tags($xml->saveXML()));
            $text = '';
            foreach($contents as $i=>$content) {
                $text .= $contents[$i];
            }
            return $text;
        }
        $zip->close();
    }
    // In case of failure return empty string
    return "";
}

ZipArchiveand DOMDocumentare both inside PHP so you don't need to install/include/require additional libraries.

ZipArchiveDOMDocument都在 PHP 中,因此您不需要安装/包含/需要其他库。

回答by Eduardo

One may use PHPDocX.

可以使用PHPDocX

It has support for practically all HTML CSS styles. Moreover you may use templates to add extra formatting to your HTML via the replaceTemplateVariableByHTML.

它支持几乎所有的 HTML CSS 样式。此外,您可以使用模板通过replaceTemplateVariableByHTML.

The HTML methods of PHPDocX also allow for the direct use of Word styles. You may use something like this:

PHPDocX 的 HTML 方法还允许直接使用 Word 样式。你可以使用这样的东西:

$docx->embedHTML($myHTML, array('tableStyle' => 'MediumGrid3-accent5PHPDOCX'));

$docx->embedHTML($myHTML, array('tableStyle' => 'MediumGrid3-accent5PHPDOCX'));

If you want that all your tables use the MediumGrid3-accent5 Word style. The embedHTML method as well as its version for templates (replaceTemplateVariableByHTML) preserve inheritance, meaning by that that you may use a predefined Word style and override with CSS any of its properties.

如果您希望所有表格都使用 MediumGrid3-accent5 Word 样式。embedHTML 方法及其模板版本 ( replaceTemplateVariableByHTML) 保留了继承性,这意味着您可以使用预定义的 Word 样式并使用 CSS 覆盖其任何属性。

You may also extract selected parts of your HTML using 'JQuery type' selectors.

您还可以使用“JQuery 类型”选择器提取 HTML 的选定部分。

回答by Ron

You can convert Word docx documents to html using Print2flash library. Here is an PHP excerpt from my client's site which converts a document to html:

您可以使用 Print2flash 库将 Word docx 文档转换为 html。这是我客户网站上的一段 PHP 摘录,它将文档转换为 html:

include("const.php");
$p2fServ = new COM("Print2Flash4.Server2");
$p2fServ->DefaultProfile->DocumentType=HTML5;
$p2fServ->ConvertFile($wordfile,$htmlFile);

It converts a document which path is specified in $wordfile variable to a html page file specified by $htmlFile variable. All formatting, hyperlinks and charts are retained. You can get the required const.php file altogether with a fuller sample from Print2flash SDK.

它将 $wordfile 变量中指定路径的文档转换为 $htmlFile 变量指定的 html 页面文件。保留所有格式、超链接和图表。您可以通过Print2flash SDK 中的更完整示例一起获取所需的 const.php 文件。

回答by Ilya P

If you don't refuse REST API, then you can use:

如果您不拒绝 REST API,那么您可以使用:

  • Apache Tika. Is a proven OSS leader for text-extraction
  • If you don't want to hassle with configuring and want ready-to-go solution you can use RawText, but it's not free.
  • 阿帕奇蒂卡。是公认的文本提取 OSS 领导者
  • 如果您不想麻烦配置并想要现成的解决方案,您可以使用RawText,但它不是免费的。

Sample code for RawText:

RawText 的示例代码:

$result = $rawText -> parse($your_file)