用 PHP 读取 docx(Office Open XML)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1501623/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
reading docx (Office Open XML) in PHP
提问by RageZ
I want to add an word import function to our CMS, the only problem I cannot seems to find a good library for reading docx files (Word 2007).
我想在我们的 CMS 中添加一个单词导入功能,唯一的问题是我似乎找不到一个好的库来阅读 docx 文件(Word 2007)。
Do anyone has some recommendations, the library should be able to extract content of the document and basic styling like italic, bold, superscript?
有没有人有一些建议,库应该能够提取文档的内容和斜体、粗体、上标等基本样式?
Thanks for your help
谢谢你的帮助
采纳答案by Anthony
Or, since you requested a library, you may want to look into something like Docvert. I was just looking around based on your question, and it's my favorite so far for PHP. You input the word file location, it transforms it into something simple with the attributes and all that good stuff.
或者,由于您请求了一个库,您可能需要查看类似Docvert 的内容。我只是根据您的问题环顾四周,这是迄今为止我最喜欢的 PHP。您输入单词文件位置,它会将其转换为具有属性和所有好东西的简单内容。
回答by Anthony
docxfiles are actually just containers for the document's XML. You should be able to unzip the docx file and then go to the word folder inside, then to the document.xml. This has the actual text. But things like the fonts and styles are in other xml files in the docx container, so you'll probably want to mess around a bit and figure out what is what and how to match it up (start by using namespaces, I bet).
docx文件实际上只是文档 XML 的容器。您应该能够解压缩 docx 文件,然后转到里面的 word 文件夹,然后转到 document.xml。这有实际的文本。但是诸如字体和样式之类的东西在 docx 容器中的其他 xml 文件中,所以您可能想弄乱一点并弄清楚什么是什么以及如何匹配它(我敢打赌,从使用命名空间开始)。
But yea, unzip the file, then use simplexml to convert it into something you can actually mess around with.
但是,是的,解压缩文件,然后使用 simplexml 将其转换为您可以实际处理的内容。
回答by Scott Evernden
PHPDocX PROincludes a TransformDoc class that can read .docx (zip) files and generate XHTML (or PDF) from it:
PHPDocX PRO包含一个 TransformDoc 类,可以读取 .docx (zip) 文件并从中生成 XHTML(或 PDF):
...
require_once 'phpdocx_pro/classes/TransformDoc.inc';
$doc = new TransformDoc();
$doc->setStrFile($file->filepath);
$doc->generateXHTML();
$html = $doc->getStrXHTML();
回答by sohaibafifi
There is a library to do this but it works with Zend frameworkmay be it will help you It is called phpLiveDocx: http://www.phplivedocx.org/downloads/The library is licensed under New Bcd
有一个库可以做到这一点,但它可以与Zend 框架一起使用它可能会帮助你它被称为phpLiveDocx:http: //www.phplivedocx.org/downloads/该库在 New Bcd 下获得许可
回答by sohaibafifi
I have just find a library that has both reading and writing support check it on the codeplex forge http://openxmlapi.codeplex.comand it is licensed under GPLv2.
我刚刚在 codeplex forge http://openxmlapi.codeplex.com上找到了一个具有读写支持的库,并在GPLv2下获得许可。
回答by DrDol
Convert a docx document to a odt using OpenOffice. Use then eZ Componentsto do the parsing and import. They actually use the import in their CMZ eZ Publish.
使用OpenOffice将 docx 文档转换为 odt 。然后使用eZ Components进行解析和导入。他们实际上在他们的 CMZ eZ Publish 中使用了导入。
回答by andrebruton
Here is a simple working solution I found
这是我找到的一个简单的工作解决方案
http://webcheatsheet.com/php/reading_the_clean_text_from_docx_odt.php
http://webcheatsheet.com/php/reading_the_clean_text_from_docx_odt.php

