如何使用 Java 将 HTML 内容转换为 PDF 而不会丢失格式?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4712641/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 07:39:35  来源:igfitidea点击:

How to convert an HTML content to PDF without losing the formatting using Java?

javapdf-generationhtml-parsingitext

提问by Veera

I have some HTML content (including formatting tags such as strong, images etc).In my Java code, I want to convert this HTML content into a PDF document without losing the HTML formatting.

我有一些 HTML 内容(包括格式标签,例如strong、图像等)。在我的 Java 代码中,我想将此 HTML 内容转换为 PDF 文档而不会丢失 HTML 格式。

Is there anyway to do it in Java (using iText or any other library)?

有没有办法用 Java 来做(使用 iText 或任何其他库)?

采纳答案by Nate365

I would try DocRaptor.com. It converts html to pdf or html to xls in any language, and since it uses Prince XML (without making you pay the expensive license fee), the quality is a lot better than the other options out there. It's also a web app, so there's nothing to download. Easy way to get around long, frustrating coding.

我会尝试DocRaptor.com。它可以将任何语言的 html 转换为 pdf 或将 html 转换为 xls,并且由于它使用 Prince XML(无需您支付昂贵的许可费),因此质量比其他选项好得多。它也是一个网络应用程序,因此无需下载。绕过冗长、令人沮丧的编码的简单方法。

Here are some examples: https://docraptor.com/documentation#coding_examples

以下是一些示例:https: //docraptor.com/documentation#coding_examples

回答by Kirby

I used ITextRendererfrom the Flying Saucerproject.

ITextRenderer飞碟项目中使用。

Here is a short, self-contained, working example. In my case I wanted to later stream the bytes into an email attachment.

这是一个简短的、独立的、有效的示例。就我而言,我想稍后将字节流式传输到电子邮件附件中。

So, in the example I write it to a file purely for the sake of demonstration for this question. This is Java 8.

因此,在示例中,我将其写入文件纯粹是为了演示这个问题。这是 Java 8。

import com.lowagie.text.DocumentException;
import org.apache.commons.io.FileUtils;
import org.xhtmlrenderer.pdf.ITextRenderer;

import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.IOException;

public class So4712641 {

  public static void main(String... args) throws DocumentException, IOException {
    FileUtils.writeByteArrayToFile(new File("So4712641.pdf"), toPdf("<b>You gotta walk and don't look back</b>"));
  }

  /**
   * Generate a PDF document
   * @param html HTML as a string
   * @return bytes of PDF document
   */
  private static byte[] toPdf(String html) throws DocumentException, IOException {
    final ITextRenderer renderer = new ITextRenderer();
    renderer.setDocumentFromString(html);
    renderer.layout();
    try (ByteArrayOutputStream fos = new ByteArrayOutputStream(html.length())) {
      renderer.createPDF(fos);
      return fos.toByteArray();
    }
  }
}

This gives me

这给了我

enter image description here

在此处输入图片说明

For completeness, here are relevant pieces for my Maven pom.xml

为了完整起见,这里是我的 Maven 的相关部分 pom.xml

<dependencies>
    <dependency>
        <groupId>org.xhtmlrenderer</groupId>
        <artifactId>flying-saucer-pdf</artifactId>
        <version>9.0.8</version>
    </dependency>
    <dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
        <version>2.4</version>
    </dependency>
</dependencies>

回答by biziclop

Converting HTML to PDF isn't exactly straightforward in general, but if you're in control of what goes into the HTML, you can try using an XSL-FO implementation, like Apache FOP.

通常,将 HTML 转换为 PDF 并不是那么简单,但是如果您可以控制 HTML 中的内容,则可以尝试使用 XSL-FO 实现,例如Apache FOP

It's not out-of-the-box as you'll have to write (or find) a stylesheet that defines the conversion rules, but on the upside it gives you much more control over output formatting, which is quite useful as what looks good on screen doesn't necessarily look good on paper.

它不是开箱即用的,因为您必须编写(或找到)定义转换规则的样式表,但从好的方面来说,它可以让您更好地控制输出格式,这对于看起来不错的内容非常有用在屏幕上不一定在纸上看起来很好。