Java 将 HTML 文件转换为 PDF
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/633780/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Converting HTML files to PDF
提问by panschk
I need to automatically generate a PDF file from an exisiting (X)HTML-document. The input files (reports) use a rather simple, table-based layout, so support for really fancy JavaScript/CSS stuff is probably not needed.
我需要从现有的 (X)HTML 文档自动生成 PDF 文件。输入文件(报告)使用相当简单的基于表格的布局,因此可能不需要对真正花哨的 JavaScript/CSS 内容的支持。
As I am used to working in Java, a solution that can easily be used in a java-project is preferable. It only needs to work on windows systems, though.
由于我习惯于在 Java 中工作,因此更可取的是可以在 java 项目中轻松使用的解决方案。不过,它只需要在 Windows 系统上工作。
One way to do it that is feasable, but does not produce good quality output (at least out of the box) is using CSS2XSLFO, and Apache FOP to create the PDF files. The problem I encountered was that while CSS-attributes are converted nicely, the table-layout is pretty messed up, with text flowing out of the table cell.
一种可行但不能产生高质量输出(至少开箱即用)的方法是使用CSS2XSLFO和 Apache FOP 创建 PDF 文件。我遇到的问题是,虽然 CSS 属性转换得很好,但表格布局非常混乱,文本从表格单元格中流出。
I also took a quick look at Jrex, a Java-API for using the Gecko rendering engine.
我还快速浏览了 Jrex,这是一个用于使用 Gecko 渲染引擎的 Java-API。
Is there maybe a way to grab the rendered page from the internet explorer rendering engine and send it to a PDF-Printer tool automatically? I have no experience in OLE programming in windows, so I have no clue what's possible and what is not.
有没有办法从 Internet Explorer 渲染引擎中获取渲染页面并将其自动发送到 PDF 打印机工具?我没有在 Windows 中进行 OLE 编程的经验,所以我不知道什么是可能的,什么是不可能的。
Do you have an idea?
你有想法吗?
采纳答案by Mark
The Flying SaucerXHTML renderer project has support for outputting XHTML to PDF. Have a look at an example here.
回答by ólafur Waage
If you have the funding, nothing beats Prince XMLas this video shows
如果你有资金,没有什么比这个视频显示的Prince XML更好的了
回答by rojoca
You can use a headless firefox with an extension. It's pretty annoying to get running but it does produce good results.
您可以使用带有扩展程序的无头 Firefox。运行起来很烦人,但它确实产生了很好的结果。
Check out this answerfor more info.
查看此答案以获取更多信息。
回答by fred-o
Check out iText; it is a pure Java PDF toolkit which has support for reading data from HTML. I used it recently in a project when I needed to pull content from our CMS and export as PDF files, and it was all rather straightforward. The support for CSS and style tags is pretty limited, but it does render tables without any problems (I never managed to set column width though).
查看iText;它是一个纯 Java PDF 工具包,支持从 HTML 读取数据。我最近在一个项目中使用了它,当我需要从我们的 CMS 中提取内容并导出为 PDF 文件时,这一切都非常简单。对 CSS 和样式标签的支持非常有限,但它确实可以毫无问题地呈现表格(尽管我从未设法设置列宽)。
Creating a PDF from HTML goes something like this:
从 HTML 创建 PDF 是这样的:
Document doc = new Document(PageSize.A4);
PdfWriter.getInstance(doc, out);
doc.open();
HTMLWorker hw = new HTMLWorker(doc);
hw.parse(new StringReader(html));
doc.close();
回答by PhiLho
If you look at the side bar of your question, you will see many related questions...
如果您查看问题的侧栏,您会看到许多相关的问题...
In your context, the simpler method might be to install a PDF print driver like PDFCreatorand just print the page to this output.
在您的上下文中,更简单的方法可能是安装像PDFCreator这样的 PDF 打印驱动程序,然后将页面打印到此输出。
回答by Peter Boughton
Is there maybe a way to grab the rendered page from the internet explorer rendering engine and send it to a PDF-Printer tool automatically?
有没有办法从 Internet Explorer 渲染引擎中获取渲染页面并将其自动发送到 PDF 打印机工具?
This is how ActivePDFworks, which is good means that you know what you'll get, and it actually has reasonable styling support.
这就是ActivePDF 的工作方式,这意味着您知道自己会得到什么,而且它实际上具有合理的样式支持。
It is also one of the few packages I found (when looking a few years back) that actually supports the various page-break CSS commands.
它也是我发现(几年前)为数不多的实际上支持各种分页 CSS 命令的软件包之一。
Unfortunately, the ActivePDF software is very frustrating - since it has to launch the IE browser in the background for conversions it can be quite slow, and it is not particularly stable either.
不幸的是,ActivePDF 软件非常令人沮丧 - 因为它必须在后台启动 IE 浏览器进行转换,所以它可能非常慢,而且也不是特别稳定。
There is a new version currently in Beta which is supposed to be much better, but I've not actually had a chance to try it out, so don't know how much of an improvement it is.
目前在 Beta 中有一个新版本,它应该会更好,但我实际上没有机会尝试它,所以不知道它有多大的改进。
回答by Mic
Did you try WKHTMLTOPDF?
你试过WKHTMLTOPDF吗?
It's a simple shell utility, an open source implementation of WebKit. Both are free.
它是一个简单的 shell 实用程序,是 WebKit 的开源实现。两者都是免费的。
We've set a small tutorial here
我们在这里设置了一个小教程
EDIT( 2017 ):
编辑(2017 年):
If it was to build something today, I wouldn't go that route anymore.
But would use http://pdfkit.org/instead.
Probably stripping it of all its nodejs dependencies, to run in the browser.
如果是今天建造一些东西,我就不会再走这条路了。
但会使用http://pdfkit.org/代替。
可能剥离它所有的 nodejs 依赖项,以在浏览器中运行。
回答by yms
Amyuni WebkitPDFcould be used with JNI for a Windows-only solution. This is a HTML to PDF/XAML conversion library, free for commercial and non-commercial use.
Amyuni WebkitPDF可与 JNI 一起用于 Windows 解决方案。这是一个 HTML 到 PDF/XAML 的转换库,可免费用于商业和非商业用途。
If the output files are not needed immediately, for better scalability it may be better to have a queue and a few background processes taking items from there, converting them and storing then on the database or file system.
如果不是立即需要输出文件,为了更好的可扩展性,最好有一个队列和一些后台进程从那里获取项目,转换它们然后存储在数据库或文件系统上。
usual disclaimer applies
通常的免责声明适用