Html wkhtmltopdf 的表现
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11624108/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Performance of wkhtmltopdf
提问by rajadhi
We are intending to use wkhtmltopdf to convert html to pdf but we are concerned about the scalability of wkhtmltopdf. Does anyone have any idea how it scales? Our web app potentially could attempt to convert hundreds of thousands of (reletively complex)html so it's important for us to have some idea. Has anyone got any information on this?
我们打算使用 wkhtmltopdf 将 html 转换为 pdf,但我们担心 wkhtmltopdf 的可扩展性。有谁知道它是如何缩放的?我们的网络应用程序可能会尝试转换成百上千的(相对复杂的)html,因此了解一些想法对我们来说很重要。有没有人有这方面的信息?
回答by Alistair Ronan
First of all, your question is quite general; there are many variables to consider when asking about scalability of any project. Obviously there is a difference between converting "hundreds of thousands" of HTML files over a week and expecting to do that in a day, or an hour. On top of that "relatively complex" HTML can mean different things to other people.
首先,你的问题很笼统;在询问任何项目的可扩展性时,需要考虑许多变量。显然,在一周内转换“数十万”HTML 文件与期望在一天或一小时内完成转换之间存在差异。最重要的是,“相对复杂”的 HTML 对其他人来说可能意味着不同的东西。
That being said, I figured since I have done something similar to this, converting approximately 450,000 html files, utilizing wkhtmltopdf; I'd share my experience.
话虽如此,我想自从我做了类似的事情,利用 wkhtmltopdf 转换了大约 450,000 个 html 文件;我会分享我的经验。
Here was my scenario:
这是我的场景:
- 450,000 HTML files
- 95% of the files were one page in length
- generally containing 2 images (relative path, local system)
- tabular data (sometimes contained nested tables)
- simple markup elsewhere (strong, italic, underline, etc)
- A spare desktop PC
- 8GB RAM
- 2.4GHz Dual Core Processor
- 7200RPM HD
- 450,000 个 HTML 文件
- 95% 的文件长度为一页
- 一般包含2张图片(相对路径,本地系统)
- 表格数据(有时包含嵌套表格)
- 其他地方的简单标记(强、斜体、下划线等)
- 备用台式电脑
- 8GB 内存
- 2.4GHz 双核处理器
- 7200RPM 高清
I used a simple single threaded script written in PHP, to iterate over the folders and pass the html file path to wkhtmltopdf. The process took about 2.5 days to convert all the files, with very minimal errors.
我使用了一个用 PHP 编写的简单单线程脚本来遍历文件夹并将 html 文件路径传递给 wkhtmltopdf。转换所有文件的过程大约需要 2.5 天,错误非常少。
I hope this gives you insight to what you can expect from utilizing wkhtmltopdf in your web application. Some obvious improvements would come from running this on better hardware but mainly from utilizing a multi-threaded application to process files simultaneously.
我希望这能让您深入了解在您的 Web 应用程序中使用 wkhtmltopdf 可以期待什么。一些明显的改进将来自在更好的硬件上运行它,但主要来自利用多线程应用程序同时处理文件。
回答by Nenotlep
In my experience performance depends a lot on your pictures. It there are lots of large pictures it can slow down significantly. If at all possible I would try to stage a test with an estimate of what the load would be for your servers. Some people do use it for intensive operations, but I have never heard of hundrerds of thousands. I guess like everything, it depends on your content and resources.
根据我的经验,性能在很大程度上取决于您的图片。它有很多大图片,它可以显着减慢。如果可能的话,我会尝试进行测试,估计您的服务器的负载。有些人确实将它用于密集型操作,但我从未听说过有成百上千的。我想一切都一样,这取决于您的内容和资源。
The following quote is straight off the wkhtmltopdf mailing list:
以下引用直接来自wkhtmltopdf 邮件列表:
I'm using wkHtmlToPDF to convert about 6000 E-mails a day to PDF. It's all done on a quadcore server with 4GB memory... it's even more then enough for that.
我正在使用 wkHtmlToPDF 每天将大约 6000 封电子邮件转换为 PDF。这一切都是在具有 4GB 内存的四核服务器上完成的……它甚至已经足够了。
There are a few performance tips, but I would suggest trying out what is your bottlenecks before optimizing for performance. For instance I remember some person saying that if possible, loading images directly from disk instead of having a web server inbetween can speed it up conciderably.
有一些性能提示,但我建议在优化性能之前先尝试一下您的瓶颈是什么。例如,我记得有人说过,如果可能的话,直接从磁盘加载图像而不是在两者之间使用 Web 服务器可以显着加快速度。
Edit: Adding to this I just had some fun playing with wkhtmltopdf. Currently on an Intel Centrino 2 with 4Gb memory I generate PDF with 57 pages of content (mixed p,ul,table), ~100 images and a toc takes consistently < 7 seconds. I'm also running visual studio, browser, http server and various other software that might slow it down. I use stdin and stdout directly instead of files.
编辑:除此之外,我在玩 wkhtmltopdf 时玩得很开心。目前在具有 4Gb 内存的 Intel Centrino 2 上,我生成了包含 57 页内容(混合 p、ul、table)、约 100 个图像和 toc 的 PDF,持续时间 < 7 秒。我还在运行 Visual Studio、浏览器、http 服务器和其他各种可能会降低速度的软件。我直接使用 stdin 和 stdout 而不是文件。
Edit: I have not tried this, but if you have linked CSS, try embedding it in the HTML file (remember to do a before and after test to see the effects properly!). The improvement here most likely depends on things like caching and where the CSS is served - if it's read from disk every time or god forbid regenerated from scss, it could be pretty slow, but if the result is cached by the webserver (I dont think wkhtmltopdf caches anything between instances) it might not have a big effects. YMMV.
编辑:我没有试过这个,但是如果你已经链接了 CSS,请尝试将它嵌入到 HTML 文件中(记得做一个前后测试以正确查看效果!)。这里的改进很可能取决于缓存和 CSS 的提供位置之类的东西 - 如果它每次都从磁盘读取或上帝禁止从 scss 重新生成,它可能会非常慢,但如果结果被网络服务器缓存(我不认为wkhtmltopdf 在实例之间缓存任何东西)它可能不会有太大的影响。天啊。
回答by Ish
wkhtmltopdf --print-media-type
is blazing fast. But you loose normal CSS styling with that.
wkhtmltopdf --print-media-type
正在快速燃烧。但是你会失去正常的 CSS 样式。
This may NOT be an ideal solution for complex html pages export. But it worked for me because my html contents are pretty simple and in tabular form.
这可能不是复杂 html 页面导出的理想解决方案。但它对我有用,因为我的 html 内容非常简单并且是表格形式。
Tested on version wkhtmltopdf 0.12.2.1
在版本上测试 wkhtmltopdf 0.12.2.1
回答by badma
We try to use wkhtmltopdf in any implementations. My objects are huge tables for generated coordinate points. Typically volume of my pdf = 500 pages
我们尝试在任何实现中使用 wkhtmltopdf。我的对象是用于生成坐标点的巨大表格。通常我的 pdf 卷 = 500 页
We try to use port of wkhtmltopdf to .net. Results are
我们尝试使用 wkhtmltopdf 的端口到 .net。结果是
- Pechkin - Pro: don't need other app. Contra: slow. 500 pages generated about 5 minutes
- PdfCodaxy - only contra: slow. Slower than pure wkhtmltopdf. Required installed wkhtmltopdf. Problems with non unicode text
- Nreco - only contra: slow. Slower than pure wkhtmltopdf. Required installed wkhtmltopdf. Incorrect unlock libs after use (for me)
We try to use binary wkhtmltopdf invoked from C# code.
我们尝试使用从 C# 代码调用的二进制 wkhtmltopdf。
Pro: easy to use, faster that libs
Contra: need temporary files (cannot use Stream objects). Break with very huge (100MB+)html files as like as other libs
回答by Alexn
You can create own pool of the wkhtmltopdf engines. I did it for a simple use case by invoking API directly instead of start process wkhtmltopdf.exe every time. The wkhtmltopdf API is not thread-safe, so it's not easy to do. Also, you should not forget about sharing a native code between AppDomains.
您可以创建自己的 wkhtmltopdf 引擎池。我通过直接调用 API 而不是每次都启动进程 wkhtmltopdf.exe 来为一个简单的用例做到这一点。wkhtmltopdf API 不是线程安全的,所以不容易做到。此外,您不应忘记在 AppDomains 之间共享本机代码。