在 python 3 中解析 .docx
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21667719/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parse .docx in python 3
提问by thehoule64
I am currently writing a python 3 program that parses through certain docx files and extracts the text and images from them. I have been trying to use docxbut it will not import into my program. I have installed lxml, Pillow, and python-docx yet it does not import. When I try to use python-docx from the terminal I cannot use example-extracttext.py or example-makedocument.py which brings me to believe that the installation didn't run properly. Is there a way I can check if this installed correctly or is there a way to get this working properly so I can import it into my project? I am on Ubuntu 13.10.
我目前正在编写一个 python 3 程序,它解析某些 docx 文件并从中提取文本和图像。我一直在尝试使用docx,但它不会导入到我的程序中。我已经安装了 lxml、Pillow 和 python-docx,但它没有导入。当我尝试从终端使用 python-docx 时,我无法使用 example-extracttext.py 或 example-makedocument.py 这让我相信安装没有正常运行。有没有办法可以检查它是否安装正确,或者有没有办法让它正常工作以便我可以将它导入到我的项目中?我在 Ubuntu 13.10 上。
采纳答案by scanny
I recommend you try the latest version of python-docx which is installed like this:
我建议您尝试安装最新版本的 python-docx:
$ pip install python-docx
Documentation is available here: http://python-docx.readthedocs.org/
文档可在此处获得:http: //python-docx.readthedocs.org/
Installation should result in a message that looks successful. It's possible you'll need to install using sudo to temporarily assume root privileges:
安装应该会显示一条看起来成功的消息。您可能需要使用 sudo 进行安装以暂时使用 root 权限:
$ sudo pip install python-docx
After installation you should be able to do the following in the Python interpreter:
安装后,您应该能够在 Python 解释器中执行以下操作:
>>> from docx import Document
>>>
If instead you get something like this, the install didn't go properly:
相反,如果你得到这样的东西,安装没有正确进行:
>>> from docx import Document
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named docx
As you can provide more feedback on your attempts I can elaborate the answer.
由于您可以就您的尝试提供更多反馈,因此我可以详细说明答案。
Note that after v0.2.x the python-docx package was rewritten. The API of v0.3.x+ is different as well as the package name and repository location. All further development will be on the new version. If you're just starting out with the package going with the latest is probably a good idea as the old one will just be receiving legacy support going forward.
请注意,在 v0.2.x 之后重写了 python-docx 包。v0.3.x+ 的 API 以及包名称和存储库位置不同。所有进一步的开发都将在新版本上进行。如果您刚开始使用最新的软件包可能是一个好主意,因为旧的软件包将继续获得遗留支持。
Also, the Python 3 support was added with v0.3.0. Prior versions are not Python 3 compatible.
此外,v0.3.0 添加了对 Python 3 的支持。之前的版本与 Python 3 不兼容。
回答by pallavi
Use Command- sudo pip install --pre python-docxfor the latest version of python-docx.
使用 Command-sudo pip install --pre python-docx获取最新版本的 python-docx。

