java 使用java解析Pdf文件并在word文件中写入内容
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/514885/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Parse Pdf File and write content in word file using java
提问by kedar kamthe
how to Parse a PDF file and write the content in word file using Java?
如何使用Java解析PDF文件并将内容写入word文件中?
回答by breakingobstacles
For parsing a PDF file in Java, you can use Apache PDFBox: http://incubator.apache.org/pdfbox/
要在 Java 中解析 PDF 文件,您可以使用 Apache PDFBox:http: //incubator.apache.org/pdfbox/
For reading/writing Word (or other Office) file formats in Java, try POI: http://poi.apache.org/
要在 Java 中读取/写入 Word(或其他 Office)文件格式,请尝试 POI:http: //poi.apache.org/
Both are free.
两者都是免费的。
回答by gimel
Try the iTextjava library:
试试iTextjava 库:
iText is an ideal library for developers looking to enhance web- and other applications with dynamic PDF document generation and/or manipulation.
对于希望通过动态 PDF 文档生成和/或操作来增强 Web 和其他应用程序的开发人员来说,iText 是一个理想的库。
It can be used for your parsing step.
它可用于您的解析步骤。
As for generating word documents - the OpenOffice Java APImight be able to generate Word compatible docs (no personal experience with this API).
至于生成 Word 文档——OpenOffice Java API或许能够生成 Word 兼容的文档(个人对此 API 没有经验)。
回答by gimel
You might want to try any of these:
您可能想尝试以下任何一项:
Once you are reading the contents of the PDF file, you can as well store them in a ODT file or a text file. For ODT file, try http://odftoolkit.openoffice.org.
一旦您阅读了 PDF 文件的内容,您也可以将它们存储在 ODT 文件或文本文件中。对于 ODT 文件,请尝试http://odftoolkit.openoffice.org。
Best!
最好的!
回答by Jes
You could use iText if the source PDF is mostly text. Images and such are quite hard to handle while parsing. If it's text only, it's as easy as 10 lines of code. See the iText manual for examples.
如果源 PDF 主要是文本,您可以使用 iText。图像等在解析时很难处理。如果它只是文本,它就像 10 行代码一样简单。有关示例,请参阅 iText 手册。
For writing word files there's only Apache POI. It can be a little tricky to figure out, but for such a simple task it shouldn't be any problem.
对于编写 word 文件,只有 Apache POI。弄清楚可能有点棘手,但对于这样一个简单的任务,它应该没有任何问题。

