java 使用 ITextRenderer 从带有非拉丁字符的 HTML 生成 PDF 不起作用
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10250606/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Generation of PDF from HTML with non-Latin characters using ITextRenderer does not work
提问by alexandros
This is the 2nd day I spend investigating with no results. At least now, I am able to ask something very specific.
这是我花在没有结果的调查上的第 2 天。至少现在,我可以问一些非常具体的问题。
I am trying to write a valid HTML code that contains some non-Latin characters in a PDF file using iTextand more specifically using ITextRendererfrom Flying Saucer.
我正在尝试使用iText编写一个有效的 HTML 代码,该代码在 PDF 文件中包含一些非拉丁字符,更具体地说,使用来自Flying Saucer 的ITextRenderer。
My short example/code starts by initializing a string variable doc with this value:
我的简短示例/代码首先使用此值初始化字符串变量 doc:
String doc = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><html xmlns=\"http://www.w3.org/1999/xhtml\" lang=\"en\">"
+ "<body>Some greek characters: Καλημ?ρα Some greek characters"
+ "</body></html>";
Here is the code that I use for debugging purposes. I save this string to HTML file and then I open it through a browser just to double check that HTML content is valid and I can still read Greek characters:
这是我用于调试目的的代码。我将此字符串保存到 HTML 文件,然后通过浏览器打开它,只是为了仔细检查 HTML 内容是否有效,并且我仍然可以读取希腊字符:
//write for debugging purposes in an html file
File newTextFile = new File("C:/work/test.html");
FileWriter fw = new FileWriter(newTextFile);
fw.write(doc);
fw.close();
Next step is to try to write this value in the PDF file. This is my code:
下一步是尝试在 PDF 文件中写入此值。这是我的代码:
ITextRenderer renderer = new ITextRenderer();
//add some fonts - if paths are not right, an exception will be thrown
renderer.getFontResolver().addFont("c:/work/fonts/TIMES.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
renderer.getFontResolver().addFont("c:/work/fonts/TIMESBD.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
renderer.getFontResolver().addFont("c:/work/fonts/TIMESBI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
renderer.getFontResolver().addFont("c:/work/fonts/TIMESI.TTF", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
final DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory
.newInstance();
documentBuilderFactory.setValidating(false);
DocumentBuilder builder = documentBuilderFactory.newDocumentBuilder();
builder.setEntityResolver(FSEntityResolver.instance());
org.w3c.dom.Document document = builder.parse(new ByteArrayInputStream(
doc.toString().getBytes("UTF-8")));
renderer.setDocument(document, null);
renderer.layout();
renderer.createPDF(os);
The final outcome of my code is:
我的代码的最终结果是:
In HTML fileI get: Some greek characters: Καλημ?ρα Some greek characters(expected)
在 HTML 文件中,我得到:一些希腊字符:Καλημ?ρα 一些希腊字符(预期)
In PDF fileI get: Some greek characters: Some greek characters(unexpected- greek characters are ignored!!)
在 PDF 文件中,我得到:一些希腊字符:一些希腊字符(意外- 希腊字符被忽略!!)
Dependencies:
依赖项:
java version "1.6.0_27"
itext-2.0.8.jar
de.huxhorn.lilith.3rdparty.flyingsaucer.core-renderer-8Pre2.jar
java版本“1.6.0_27”
itext-2.0.8.jar
de.huxhorn.lilith.3rdparty.flyingsaucer.core-renderer-8Pre2.jar
I also have been experimented with much more fonts, but I guess that my problem has nothing to do with using wrong fonts. Any help is more than welcome.
我也尝试过更多的字体,但我想我的问题与使用错误的字体无关。任何帮助都非常受欢迎。
Thanx
谢谢
回答by ArcanisCz
i am from Czech Republic, and had same problem with our national symbols! After some searching, i managed to solve it with this solution.
我来自捷克共和国,对我们的国家标志也有同样的问题!经过一番搜索,我设法用这个解决方案解决了它。
Specifically with (which you already have):
特别是(您已经拥有):
renderer
.getFontResolver()
.addFont(fonts.get(i).getFile().getPath(),
BaseFont.IDENTITY_H,
BaseFont.NOT_EMBEDDED);
and then importantpart in CSS:
然后是 CSS 中的重要部分:
* {
font-family: Verdana;
/* font-family: Times New Roman; - alternative. Without ""! */
}
It seems to me, without that css, your fonts are not used. When i remove theese lines from CSS, encoding is broken again.
在我看来,没有那个 css,你的字体就不会被使用。当我从 CSS 中删除这些行时,编码再次被破坏。
Hope this will help!
希望这会有所帮助!
回答by C2V3N
Add to your HTML something like this:
在你的 HTML 中添加如下内容:
<?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE html>
<html>
<head>
<meta http-equiv='Content-Type' content='text/html; charset=UTF-8'/>
<style type='text/css'>
* { font-family: 'Arial Unicode MS'; }
</style>
</head>
<body>
<span>Some text with ????? characters</span>
</body>
</html>
and then add FontResolver to ITextRenderer in java code:
然后在java代码中将FontResolver添加到ITextRenderer:
ITextRenderer renderer = new ITextRenderer();
renderer.getFontResolver().addFont("fonts/ARIALUNI.TTF", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
works great for Croatian characters
适用于克罗地亚字符
jars used for generating PDF are:
用于生成 PDF 的 jars 是:
core-renderer.jar
iText-2.0.8.jar
回答by Ravinder Reddy
Let the iText
read a header info from your html content that it contains utf-8
content.
Add meta
tag for content-type
in html code with utf-8
charset
encoding then run iText
to generate PDF and check the result.
让它iText
从包含utf-8
内容的html 内容中读取标题信息。在带有编码的html 代码中
添加meta
标签,然后运行以生成 PDF 并检查结果。 content-type
utf-8
charset
iText
<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
Some greek characters: Καλημ?ρα Some greek characters
</body>
</html>
Update:
If the above is not working, then refer to ENCODING VERSUS THE DEFAULT CHARSET USED BY THE JVM
in the document published at http://www.manning.com/lowagie2/iText2E_MEAP_CH02.pdf
更新:
如果上述方法不起作用,请参考http://www.manning.com/lowagie2/iText2E_MEAP_CH02.pdf 上ENCODING VERSUS THE DEFAULT CHARSET USED BY THE JVM
发布的文档