Linux 如何自动化 HTML 到 PDF 的转换?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/176476/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 16:34:23  来源:igfitidea点击:

How can I automate HTML-to-PDF conversions?

linuxperlpdf

提问by lennysan

I've been using htmldoc for a while, but I've run into some fairly serious limitations. I need the end solution to work on a Linux box. I'll be calling this library/utility/application from a Perl app, so any Perl interfaces would be a bonus.

我已经使用 htmldoc 一段时间了,但遇到了一些相当严重的限制。我需要在 Linux 机器上工作的最终解决方案。我将从 Perl 应用程序调用这个库/实用程序/应用程序,因此任何 Perl 接口都将是一个奖励。

采纳答案by Orion Edwards

NOTE: This answer is from 2008 and is probably now incorrect; please check the other answers

注意:这个答案来自 2008 年,现在可能不正确;请检查其他答案

PrinceXMLis the best one I've seen (it parses regular HTML as well as XML/XHTML). How is it the best? Well, it passes the acid2 testwhich I thought was pretty darn impressive

PrinceXML是我见过的最好的一个(它解析常规 HTML 以及 XML/XHTML)。怎样才是最好的?好吧,它通过了 acid2 测试,我认为这非常令人印象深刻

It is however, quite expensive

然而,它相当昂贵

回答by Declan Shanaghy

I wont claim this is the "best" solution but it is "a" solution i have used.

我不会声称这是“最佳”解决方案,但它是我使用过的“一个”解决方案。

HTML Input --> HTML 2 PS--> PS 2 PDF--> PDF Output

HTML 输入 --> HTML 2 PS--> PS 2 PDF--> PDF 输出

回答by Jeremy

This would be total overkill, but you could download and install mirth. It is a message routing engine, but it has the ability to convert html to pdf, so you could set it up to pick up an html file in a folder, convert to pdf, and drop the pdf in the same or other folder. Like I said, overkill, a bit of a learning curve, but it's free, and java so you can run it on linux if you like. And all your perl app would have to do is drop the html to a file.

这完全是矫枉过正,但您可以下载并安装mirth。它是一个消息路由引擎,但它具有将 html 转换为 pdf 的能力,因此您可以将其设置为在文件夹中提取 html 文件,转换为 pdf,然后将 pdf 放入同一文件夹或其他文件夹中。就像我说的,矫枉过正,有点学习曲线,但它是免费的,而且是 java,所以如果你愿意,你可以在 linux 上运行它。你的 perl 应用程序要做的就是将 html 放到一个文件中。

回答by bmdhacks

I did a bit of googling for you and came up with two options. There may be more, my google strategy was to try "webkit command-line pdf" and "gecko command-line pdf", basically looking for commandline programs that embed the two popular open-source rendering engines in command-line renderers. Here's what I found:

我为你做了一些谷歌搜索,并提出了两个选择。可能还有更多,我的google策略是尝试“webkit command-line pdf”和“gecko command-line pdf”,基本上是寻找在命令行渲染器中嵌入两个流行的开源渲染引擎的命令行程序。这是我发现的:

Firefox command-line printer- outputs to pdf and png

Firefox 命令行打印机- 输出为 pdf 和 png

wkpdf- while this is for mac, it's probably pretty portable.

wkpdf- 虽然这是针对 mac 的,但它可能非常便携。

回答by Alexandre

Sorry to unearth this old post, but it came out first in my search for the best HTML/PDF conversion tool. On Linux wkhtmltopdfis very good (takes into account CSS, among others) and GPL.

很抱歉发现这个旧帖子,但它首先出现在我寻找最好的 HTML/PDF 转换工具的过程中。在 Linux 上wkhtmltopdf非常好(考虑到 CSS 等)和 GPL。

回答by mti2935

You might want to check out 'Document Conversion Service' by Peernet (at http://www.peernet.com/conversion-software/batch-document-converter/). This runs as a service on a Windows Desktop or Windows Server machine. It opens HTML documents in a web browser, then prints them through a print driver to create PDF documents, so that the PDF document produced looks exactly as if you had printed the HTML document from the browser.

您可能想查看 Peernet 的“文档转换服务”(位于http://www.peernet.com/conversion-software/batch-document-converter/)。这在 Windows 桌面或 Windows Server 计算机上作为服务运行。它在 Web 浏览器中打开 HTML 文档,然后通过打印驱动程序打印它们以创建 PDF 文档,这样生成的 PDF 文档看起来就像您从浏览器打印 HTML 文档一样。

回答by MrTux

You should have a look at http://phantomjs.org/

你应该看看http://phantomjs.org/

Conversion can be done by a small script rasterize.jsand then issuing

转换可以通过一个小脚本rasterize.js然后发出

phantomjs rasterize.js 'http://en.wikipedia.org/w/index.php?title=Jakarta&printable=yes' jakarta.pdf

回答by sudoman

WeasyPrintproduces nice PDFs with selectable text and hyperlinks.

WeasyPrint生成带有可选文本和超链接的精美 PDF。

weasyprint input.html output.pdf

If you use wkhtmltopdfinstead, try the following options:

如果您wkhtmltopdf改用,请尝试以下选项:

wkhtmltopdf --margin-bottom 20mm --margin-top 20mm --minimum-font-size 16 ...

回答by Roben

Update 2019-05

更新 2019-05

The whole process has thankfully been packed into a docker image by TheCodingMachine: https://github.com/thecodingmachine/gotenberg

幸运的是,整个过程已被 TheCodingMachine 打包到一个 docker 镜像中:https: //github.com/thecodingmachine/gotenberg

This makes maintenance and usage of chrome based pdf generation in production environments really smooth and hassle free.

这使得在生产环境中维护和使用基于 chrome 的 pdf 生成非常顺畅和轻松。



There is a new headless mode since Chrome 59. As all the other solutions really struggle with newer (or not so new anymore) CSS features like flexbox, this was in my case the only solution to produce a proper PDF output.

自 Chrome 59 以来,有一种新的无头模式。由于所有其他解决方案都在与较新(或不再那么新)的 CSS 功能(如 flexbox)斗争,因此在我的情况下,这是生成正确 PDF 输出的唯一解决方案。

To create a pdf from a local html file just use the following command: chrome --headless --disable-gpu --print-to-pdf file:///path/to/myfile.html.

若要从本地的HTML文件的PDF只需要使用下面的命令: chrome --headless --disable-gpu --print-to-pdf file:///path/to/myfile.html

For Mac OS substitue chromewith /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome.

对于 Mac OSchrome/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome.

The only downside I noticed so far is that (currently) you can not pass the html via stdin, but creating a temporary file is not that much of an issue.

到目前为止,我注意到的唯一缺点是(目前)您无法通过 stdin 传递 html,但创建临时文件并不是什么大问题。

For more information see https://developers.google.com/web/updates/2017/04/headless-chrome#create_a_pdf_dom

有关更多信息,请参阅https://developers.google.com/web/updates/2017/04/headless-chrome#create_a_pdf_dom

Update: As it turns out, the chrome guys will most likely provide some kind of node module for this task, which would eventually deprecate the headless mode (https://bugs.chromium.org/p/chromium/issues/detail?id=719921).

更新:事实证明,chrome 人员很可能会为此任务提供某种节点模块,这最终会弃用无头模式(https://bugs.chromium.org/p/chromium/issues/detail?id =719921)。

The best bet would be to use the node based approach using the puppeteer module as documented under https://developers.google.com/web/updates/2017/04/headless-chrome#nodeand print the page via the Page.printToPDF command, which enables some additional configuration, too.

最好的选择是使用基于节点的方法使用puppeteer模块,如https://developers.google.com/web/updates/2017/04/headless-chrome#node 中所述,并通过 Page.printToPDF 打印页面命令,它也可以启用一些额外的配置。

Of course, you can connect to the debug console websocket from any other environment than node (i.e. PHP script), too.

当然,您也可以从节点以外的任何其他环境(即 PHP 脚本)连接到调试控制台 websocket。

回答by Micah Elliott

I have found Electroshotto be supportive of modern CSS features, particularly layout. This was after struggling with wkhtmltopdf showing its age in not supporting things like CSS3.

我发现Electroshot支持现代 CSS 功能,尤其是布局。这是在 wkhtmltopdf 在不支持 CSS3 之类的东西方面显示其年龄的挣扎之后。

From Electroshot's features description:

来自 Electroshot 的功能描述:

Electroshot uses Electron, which offers the most recent stable version of Chrome (rather than one from years ago); this means that pages render as they would in a browser...

Electroshot 使用 Electron,它提供了 Chrome 的最新稳定版本(而不是几年前的版本);这意味着页面会像在浏览器中一样呈现......

I've been able to use Bootstrap 4 to design a page, and then use Electroshot to render a PDF very closely resembling the HTML/CSS.

我已经能够使用 Bootstrap 4 来设计页面,然后使用 Electroshot 来呈现与 HTML/CSS 非常相似的 PDF。