在 Java 中将 PDF 转换为 Word
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4090154/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Convert PDF to Word in Java
提问by user121196
Is it possible to convert PDF to Word in Java? I'm not talking about parsing a PDF document and then custom render it again to Word. I want a Java library that can directly convert it.
是否可以在 Java 中将 PDF 转换为 Word?我不是在谈论解析 PDF 文档,然后将其再次自定义呈现到 Word。我想要一个可以直接转换它的Java库。
回答by Michael Shopsin
Reading PDF documents is a very involved process and there are no good free libraries for extracting non-text information from PDF documents in Java. Worse yet, PDF documents have a lot of layout information that is hard to reconstruct, for example a table in a Word document becomes some lines and a bunch of pieces of text in PDF.
阅读 PDF 文档是一个非常复杂的过程,并且没有很好的免费库可以用 Java 从 PDF 文档中提取非文本信息。更糟糕的是,PDF 文档有很多难以重构的布局信息,例如 Word 文档中的表格变成了 PDF 中的一些行和一堆文本。
回答by peter.murray.rust
It is almost impossible to recreate semantic information from an arbitrary PDF. If you have the same tool that wrote it you have somewhat more chance but even so there is much uncertainty. The only thing you can be sure of in a (text) PDF is the position of each character on the page. (Note that some PDFs include bitmaps in which textual information occurs and that has to rely on OCR).
从任意 PDF 重新创建语义信息几乎是不可能的。如果您拥有编写它的相同工具,您就有更多的机会,但即便如此,仍有很多不确定性。在(文本)PDF 中,您唯一可以确定的是每个字符在页面上的位置。(请注意,某些 PDF 包含位图,其中出现文本信息并且必须依赖 OCR)。
There are several groups in computer science departments and elsewqhere who are spending very significant effort to try and get semantic information. We collaborate with Penn State - one of the leaders - and they are working on extracting tables. In good casees they get 90% in bad ones 50%.
计算机科学系和其他地方有几个小组正在花费大量精力来尝试获取语义信息。我们与宾夕法尼亚州立大学(领导者之一)合作,他们正在研究提取表格。在好的情况下,他们得到 90% 在坏的情况下 50%。
So the answer is formally that you cannot, but you may occasionally be fortunate. (We do a lot of this for chemistry and count ourselves lucky if we get 50% on a regular basis).
所以答案是正式的,你不能,但你可能偶尔会很幸运。(我们为化学做了很多这样的事情,如果我们定期获得 50% 就算幸运了)。
回答by uris
You can try to do it with the iText library. Read the PDF and then write it as an RTF.
This is not that simple though, as you have to preserve the different style that the PDF has.You can use some external tools.
Install some free program like "Free PDF to Doc" and execute it from you java program.
This Works fine in most cases.use the Acrobat Pro SDK from you java code.
您可以尝试使用 iText 库来实现。阅读 PDF,然后将其写为 RTF。
但这并不是那么简单,因为您必须保留 PDF 的不同样式。您可以使用一些外部工具。
安装一些免费程序,例如“Free PDF to Doc”并从您的 Java 程序中执行它。
这在大多数情况下工作正常。使用 Java 代码中的 Acrobat Pro SDK。
Best of luck
祝你好运