java 使用 ITextRenderer 从带有非拉丁字符的 HTML 生成 PDF 不起作用

Question

提问by alexandros

This is the 2nd day I spend investigating with no results. At least now, I am able to ask something very specific.

这是我花在没有结果的调查上的第 2 天。至少现在，我可以问一些非常具体的问题。

I am trying to write a valid HTML code that contains some non-Latin characters in a PDF file using iTextand more specifically using ITextRendererfrom Flying Saucer.

我正在尝试使用iText编写一个有效的 HTML 代码，该代码在 PDF 文件中包含一些非拉丁字符，更具体地说，使用来自Flying Saucer 的ITextRenderer。

My short example/code starts by initializing a string variable doc with this value:

我的简短示例/代码首先使用此值初始化字符串变量 doc：

String doc = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\">"
            + "<body>Some greek characters: Καλημ?ρα Some greek characters"
            + "</body></html>";

Here is the code that I use for debugging purposes. I save this string to HTML file and then I open it through a browser just to double check that HTML content is valid and I can still read Greek characters:

这是我用于调试目的的代码。我将此字符串保存到 HTML 文件，然后通过浏览器打开它，只是为了仔细检查 HTML 内容是否有效，并且我仍然可以读取希腊字符：

//write for debugging purposes in an html file
File newTextFile = new File("C:/work/test.html");
FileWriter fw = new FileWriter(newTextFile);
fw.write(doc);
fw.close();

Next step is to try to write this value in the PDF file. This is my code:

下一步是尝试在 PDF 文件中写入此值。这是我的代码：

ITextRenderer renderer = new ITextRenderer();
    //add some fonts - if paths are not right, an exception will be thrown
    renderer.getFontResolver().addFont("c:/work/fonts/TIMES.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
    renderer.getFontResolver().addFont("c:/work/fonts/TIMESBD.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
    renderer.getFontResolver().addFont("c:/work/fonts/TIMESBI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
    renderer.getFontResolver().addFont("c:/work/fonts/TIMESI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);


    final DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory
            .newInstance();
    documentBuilderFactory.setValidating(false);
    DocumentBuilder builder = documentBuilderFactory.newDocumentBuilder();
    builder.setEntityResolver(FSEntityResolver.instance());
    org.w3c.dom.Document document = builder.parse(new ByteArrayInputStream(
            doc.toString().getBytes("UTF-8")));

    renderer.setDocument(document, null);
    renderer.layout();
    renderer.createPDF(os);

The final outcome of my code is:

我的代码的最终结果是：

In HTML fileI get: Some greek characters: Καλημ?ρα Some greek characters(expected)

在 HTML 文件中，我得到：一些希腊字符：Καλημ?ρα 一些希腊字符（预期）

In PDF fileI get: Some greek characters: Some greek characters(unexpected- greek characters are ignored!!)

在 PDF 文件中，我得到：一些希腊字符：一些希腊字符（意外- 希腊字符被忽略！！）

Dependencies:

依赖项：

java version "1.6.0_27"
itext-2.0.8.jar
de.huxhorn.lilith.3rdparty.flyingsaucer.core-renderer-8Pre2.jar

java版本“1.6.0_27”
itext-2.0.8.jar
de.huxhorn.lilith.3rdparty.flyingsaucer.core-renderer-8Pre2.jar

I also have been experimented with much more fonts, but I guess that my problem has nothing to do with using wrong fonts. Any help is more than welcome.

我也尝试过更多的字体，但我想我的问题与使用错误的字体无关。任何帮助都非常受欢迎。

Thanx

谢谢

Answer 1

回答by ArcanisCz

i am from Czech Republic, and had same problem with our national symbols! After some searching, i managed to solve it with this solution.

我来自捷克共和国，对我们的国家标志也有同样的问题！经过一番搜索，我设法用这个解决方案解决了它。

Specifically with (which you already have):

特别是（您已经拥有）：

renderer
    .getFontResolver()
    .addFont(fonts.get(i).getFile().getPath(), 
             BaseFont.IDENTITY_H, 
             BaseFont.NOT_EMBEDDED);

and then importantpart in CSS:

然后是 CSS 中的重要部分：

* {
  font-family: Verdana;
/*  font-family: Times New Roman; - alternative. Without ""! */
}

It seems to me, without that css, your fonts are not used. When i remove theese lines from CSS, encoding is broken again.

在我看来，没有那个 css，你的字体就不会被使用。当我从 CSS 中删除这些行时，编码再次被破坏。

Hope this will help!

希望这会有所帮助！

Answer 2

回答by C2V3N

Add to your HTML something like this:

在你的 HTML 中添加如下内容：

<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html>
<html>
    <head>
        <meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/>
        <style type='text/css'> 
            * { font-family: 'Arial Unicode MS'; }
        </style>
    </head>
    <body>
        <span>Some text with ????? characters</span>
    </body>
</html>

and then add FontResolver to ITextRenderer in java code:

然后在java代码中将FontResolver添加到ITextRenderer：

ITextRenderer renderer = new ITextRenderer();
renderer.getFontResolver().addFont("fonts/ARIALUNI.TTF", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);

works great for Croatian characters

适用于克罗地亚字符

jars used for generating PDF are:

用于生成 PDF 的 jars 是：

core-renderer.jar
iText-2.0.8.jar

Answer 3

回答by Ravinder Reddy

Let the iTextread a header info from your html content that it contains utf-8content.
Add metatag for content-typein html code with utf-8charsetencoding then run iTextto generate PDF and check the result.

让它iText从包含utf-8内容的html 内容中读取标题信息。在带有编码的html 代码中
添加meta标签，然后运行以生成 PDF 并检查结果。 content-typeutf-8charsetiText

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
 <head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
 </head>
 <body>
  Some greek characters: Καλημ?ρα Some greek characters
 </body>
</html>

Update:
If the above is not working, then refer to ENCODING VERSUS THE DEFAULT CHARSET USED BY THE JVMin the document published at http://www.manning.com/lowagie2/iText2E_MEAP_CH02.pdf

更新：
如果上述方法不起作用，请参考http://www.manning.com/lowagie2/iText2E_MEAP_CH02.pdf 上ENCODING VERSUS THE DEFAULT CHARSET USED BY THE JVM发布的文档

java 使用 ITextRenderer 从带有非拉丁字符的 HTML 生成 PDF 不起作用

提问by alexandros

回答by ArcanisCz

回答by C2V3N

回答by Ravinder Reddy

相关推荐

最近更新

标签

java 使用 ITextRenderer 从带有非拉丁字符的 HTML 生成 PDF 不起作用

提问by alexandros

回答by ArcanisCz

回答by C2V3N

回答by Ravinder Reddy

相关推荐

java JBOSS 7 编码未按预期工作

java 在构建 WAR 之前在 Maven 中重命名生成的文件

java Thread.currentThread().getName() 和 getName() 有什么区别？

java java中的RMI聊天程序-如何从客户端向客户端发送消息（不通过服务器）？

相关推荐

最近更新

标签