Java 如何将 Word 文档转换为 PDF?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3022376/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I convert a Word document to PDF?
提问by magh
How can I convert a Word document to PDF where the document contains various things, such as tables. When trying to use iText, the original document looks different to the converted PDF. Is there an open source API / library, rather than calling out to an executable, that I can use?
如何将 Word 文档转换为 PDF,其中文档包含各种内容,例如表格。尝试使用 iText 时,原始文档看起来与转换后的 PDF 不同。是否有我可以使用的开源 API/库,而不是调用可执行文件?
采纳答案by Michael Lloyd Lee mlk
This is quite a hard task, ever harder if you want perfect results (impossible without using Word) as such the number of APIs that just do it all for you in pure Java and are open source is zero I believe (Update: I am wrong, see below).
这是一项相当艰巨的任务,如果您想要完美的结果(不使用 Word 就不可能),那就更难了,因为在纯 Java 中为您完成所有工作并且开源的 API 数量为零我相信(更新:我错了,见下文)。
Your basic options are as follows:
您的基本选项如下:
- Using JNI/a C# web service/etc script MS Office (only option for 100% perfect results)
- Using the available APIs script Open Office (90+% perfect)
- Use Apache POI & iText (very large job, will never be perfect).
- 使用 JNI/a C# web service/etc 脚本 MS Office(100% 完美结果的唯一选项)
- 使用可用的 API 脚本 Open Office(90+% 完美)
- 使用 Apache POI 和 iText(非常大的工作,永远不会完美)。
Update - 2016-02-11Here is a cut down copy of my blog post on this subject which outlines existing products that support Word-to-PDF in Java.
更新 - 2016-02-11这是我关于此主题的博客文章的精简副本,其中概述了在 Java 中支持 Word-to-PDF 的现有产品。
Converting Microsoft Office (Word, Excel) documents to PDFs in Java
用 Java 将 Microsoft Office(Word、Excel)文档转换为 PDF
Three products that I know of can render Office documents:
我知道的三种产品可以呈现 Office 文档:
yeokm1/docs-to-pdf-converterIrregularly maintained, Pure Java, Open SourceTies together a number of libraries to perform the conversion.
yeokm1/docs-to-pdf-converter不定期维护,纯 Java,开源将许多库联系在一起来执行转换。
xdocreportActively developed, Pure Java, Open SourceIt's Java API to merge XML document created with MS Office (docx) or OpenOffice (odt), LibreOffice (odt) with a Java model to generate report and convert it if you need to another format (PDF, XHTML...).
xdocreport积极开发,纯 Java,开源它是 Java API,用于将使用 MS Office (docx) 或 OpenOffice (odt)、LibreOffice (odt) 创建的 XML 文档与 Java 模型合并以生成报告并在需要时将其转换为其他格式( PDF、XHTML...)。
Snowbound Imaging SDKClosed Source, Pure JavaSnowbound appears to be a 100% Java solution and costs over $2,500. It contains samples describing how to convert documents in the evaluation download.
Snowbound Imaging SDK闭源,纯 JavaSnowbound 似乎是 100% Java 解决方案,成本超过 2,500 美元。它包含描述如何在评估下载中转换文档的示例。
OpenOfficeAPI Open Source, Not Pure Java - Requires Open Office installedOpenOffice is a native Office suite which supports a Java API. This supports reading Office documents and writing PDF documents. The SDK contains an example in document conversion (examples/java/DocumentHandling/DocumentConverter.java). To write PDFs you need to pass the "writer_pdf_Export" writer rather than the "MS Word 97" one. Or you can use the wrapper API JODConverter.
OpenOfficeAPI 开源,非纯 Java - 需要安装 Open OfficeOpenOffice 是支持 Java API 的原生 Office 套件。这支持阅读 Office 文档和编写 PDF 文档。SDK 包含文档转换示例 (examples/java/DocumentHandling/DocumentConverter.java)。要编写 PDF,您需要传递“writer_pdf_Export”编写器而不是“MS Word 97”编写器。或者您可以使用包装 API JODConverter。
JDocToPdf- Dead as of 2016-02-11Uses Apache POI to read the Word document and iText to write the PDF. Completely free, 100% Java but has some limitations.
JDocToPdf-截止 2016-02-11使用 Apache POI 读取 Word 文档和 iText 编写 PDF。完全免费,100% Java,但有一些限制。
回答by Curtis
I haven't tried using it for MS Word, but I've had good success reading MS Excel documents using Apache POI - http://poi.apache.org/
我还没有尝试将它用于 MS Word,但我已经成功地使用 Apache POI 阅读 MS Excel 文档 - http://poi.apache.org/
回答by Thorbj?rn Ravn Andersen
Look into scripting OpenOffice.org to do the job for you.
查看脚本 OpenOffice.org 来为您完成这项工作。
回答by Paul Jowett
I agree with posters listing OpenOffice as a high-fidelity import/export facility of word / pdf docs with a Java API and it also works across platforms. OpenOffice import/export filters are pretty powerful and preserve most formatting during conversion to various formats including PDF. Docmosisand JODReportsvalue-add to make life easier than learning the OpenOffice API directly which can be challenging because of the style of the UNO api and the crash-related bugs.
我同意海报将 OpenOffice 列为带有 Java API 的 word/pdf 文档的高保真导入/导出工具,并且它也可以跨平台工作。OpenOffice 导入/导出过滤器非常强大,可以在转换为包括 PDF 在内的各种格式时保留大部分格式。 Docmosis和JODReports增值使生活比直接学习 OpenOffice API 更轻松,这可能具有挑战性,因为 UNO api 的风格和与崩溃相关的错误。
回答by Nodexpert
You can use JODConverter for this purpose. It can be used to convert documents between different office formats. such as:
为此,您可以使用 JODConverter。它可用于在不同办公格式之间转换文档。如:
- Microsoft Office to OpenDocument, and vice versa
- Any format to PDF
- And supports many more conversion as well
- It can also convert MS office 2007 documents to PDF as well with almost all formats
- Microsoft Office 到 OpenDocument,反之亦然
- 任何格式到PDF
- 并且还支持更多的转换
- 它还可以将 MS Office 2007 文档转换为 PDF 以及几乎所有格式
More details about it can be found here: http://www.artofsolving.com/opensource/jodconverter
关于它的更多细节可以在这里找到:http: //www.artofsolving.com/opensource/jodconverter
回答by leef
unoconv, it's a python tool worked in UNIX. While I use Java to invoke the shell in UNIX, it works perfect for me. My source code : UnoconvTool.java. Both JODConverter and unoconv are said to use open office/libre office.
unoconv,它是一个在 UNIX 中工作的 python 工具。当我在 UNIX 中使用 Java 调用 shell 时,它对我来说是完美的。我的源代码:UnoconvTool.java。据说 JODConverter 和 unoconv 都使用开放式办公室/自由办公室。
docx4j/docxreport, POI, PDFBox are good but they are missing some formats in conversion.
docx4j/docxreport、POI、PDFBox 都不错,但它们在转换中缺少一些格式。
回答by Selvakumar Ponnusamy
I think JOD Converter is easiest way to implement, Please refer below link for more information.
我认为 JOD Converter 是最简单的实现方法,请参阅下面的链接以获取更多信息。
http://mytechbites.blogspot.in/2014/10/convert-documents-to-pdf-in-java.html
http://mytechbites.blogspot.in/2014/10/convert-documents-to-pdf-in-java.html
回答by Sudarshan_SMD
Check out docs-to-pdf-converter on github. Its a lightweight solution designed specifically for converting documents to pdf.
在 github 上查看docs-to-pdf-converter。它是专为将文档转换为 pdf 而设计的轻量级解决方案。
Why?
I wanted a simple program that can convert Microsoft Office documents to PDF but without dependencies like LibreOffice or expensive proprietary solutions. Seeing as how code and libraries to convert each individual format is scattered around the web, I decided to combine all those solutions into one single program. Along the way, I decided to add ODT support as well since I encountered the code too.
为什么?
我想要一个简单的程序,可以将 Microsoft Office 文档转换为 PDF,但没有 LibreOffice 或昂贵的专有解决方案等依赖项。看到用于转换每种格式的代码和库如何散布在网络中,我决定将所有这些解决方案合并到一个程序中。在此过程中,我也决定添加 ODT 支持,因为我也遇到了代码。
回答by Johnny
You can use Cloudmersive native Java library. It is free for up to 50,000 conversions/month and is much higher fidelity in my experience than other things like iText or Apache POI-based methods. The documents actually look the same as they do in Microsoft Word which for me is the key. Incidentally it can also do XLSX, PPTX, and the legacy DOC, XLS and PPT conversion to PDF.
您可以使用 Cloudmersive 原生 Java 库。它是免费的,每月最多可进行 50,000 次转换,并且在我的经验中比 iText 或基于 Apache POI 的方法等其他东西的保真度高得多。这些文档实际上看起来与它们在 Microsoft Word 中所做的一样,这对我来说是关键。顺便说一句,它还可以将 XLSX、PPTX 和旧版 DOC、XLS 和 PPT 转换为 PDF。
Here is what the code looks like, first add your imports:
这是代码的样子,首先添加您的导入:
import com.cloudmersive.client.invoker.ApiClient;
import com.cloudmersive.client.invoker.ApiException;
import com.cloudmersive.client.invoker.Configuration;
import com.cloudmersive.client.invoker.auth.*;
import com.cloudmersive.client.ConvertDocumentApi;
Then convert a file:
然后转换一个文件:
ApiClient defaultClient = Configuration.getDefaultApiClient();
// Configure API key authorization: Apikey
ApiKeyAuth Apikey = (ApiKeyAuth) defaultClient.getAuthentication("Apikey");
Apikey.setApiKey("YOUR API KEY");
ConvertDocumentApi apiInstance = new ConvertDocumentApi();
File inputFile = new File("/path/to/input.docx"); // File to perform the operation on.
try {
byte[] result = apiInstance.convertDocumentDocxToPdf(inputFile);
System.out.println(result);
} catch (ApiException e) {
System.err.println("Exception when calling ConvertDocumentApi#convertDocumentDocxToPdf");
e.printStackTrace();
}
You can get an document conversion API keyfor free from the portal.
您可以从门户免费获取文档转换 API 密钥。
回答by Charles Wang
Using JACOBcall Office Wordis a 100% perfect solution. But it only supports on Windowsplatform because need Office Wordinstalled.
使用JACOB呼叫Office Word是 100% 完美的解决方案。但它只支持在Windows平台上,因为需要安装Office Word。
- Download JACOB archive (the latest version is 1.19);
- Add jacob.jar to your project classpath;
- Add jacob-1.19-x32.dll or jacob-1.19-x64.dll (depends on your jdk version) to ...\Java\jdk1.x.x_xxx\jre\bin
Using JACOB API call Office Word to convert doc/docx to pdf.
public void convertDocx2pdf(String docxFilePath) { File docxFile = new File(docxFilePath); String pdfFile = docxFilePath.substring(0, docxFilePath.lastIndexOf(".docx")) + ".pdf"; if (docxFile.exists()) { if (!docxFile.isDirectory()) { ActiveXComponent app = null; long start = System.currentTimeMillis(); try { ComThread.InitMTA(true); app = new ActiveXComponent("Word.Application"); Dispatch documents = app.getProperty("Documents").toDispatch(); Dispatch document = Dispatch.call(documents, "Open", docxFilePath, false, true).toDispatch(); File target = new File(pdfFile); if (target.exists()) { target.delete(); } Dispatch.call(document, "SaveAs", pdfFile, 17); Dispatch.call(document, "Close", false); long end = System.currentTimeMillis(); logger.info("============Convert Finished:" + (end - start) + "ms"); } catch (Exception e) { logger.error(e.getLocalizedMessage(), e); throw new RuntimeException("pdf convert failed."); } finally { if (app != null) { app.invoke("Quit", new Variant[] {}); } ComThread.Release(); } } }
}
- 下载 JACOB 存档(最新版本为 1.19);
- 将 jacob.jar 添加到您的项目类路径中;
- 将 jacob-1.19-x32.dll 或 jacob-1.19-x64.dll(取决于您的 jdk 版本)添加到 ...\Java\jdk1.x.x_xxx\jre\bin
使用 JACOB API 调用 Office Word 将 doc/docx 转换为 pdf。
public void convertDocx2pdf(String docxFilePath) { File docxFile = new File(docxFilePath); String pdfFile = docxFilePath.substring(0, docxFilePath.lastIndexOf(".docx")) + ".pdf"; if (docxFile.exists()) { if (!docxFile.isDirectory()) { ActiveXComponent app = null; long start = System.currentTimeMillis(); try { ComThread.InitMTA(true); app = new ActiveXComponent("Word.Application"); Dispatch documents = app.getProperty("Documents").toDispatch(); Dispatch document = Dispatch.call(documents, "Open", docxFilePath, false, true).toDispatch(); File target = new File(pdfFile); if (target.exists()) { target.delete(); } Dispatch.call(document, "SaveAs", pdfFile, 17); Dispatch.call(document, "Close", false); long end = System.currentTimeMillis(); logger.info("============Convert Finished:" + (end - start) + "ms"); } catch (Exception e) { logger.error(e.getLocalizedMessage(), e); throw new RuntimeException("pdf convert failed."); } finally { if (app != null) { app.invoke("Quit", new Variant[] {}); } ComThread.Release(); } } }
}