java 使用 PDFBox 将 UTF-8 编码的字符串写入 PDF

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5425251/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 11:04:04  来源:igfitidea点击:

Using PDFBox to write UTF-8 encoded strings to a PDF

javapdfunicodeutf-8pdfbox

提问by Lucas Moellers

I am having trouble writing unicode characters out to a PDF using PDFBox. Here is some sample code that generates garbage characters instead of outputting "?". What can I add to get support for UTF-8 strings?

我在使用 PDFBox 将 unicode 字符写入 PDF 时遇到问题。这是一些生成垃圾字符而不是输出“?”的示例代码。我可以添加什么来获得对 UTF-8 字符串的支持?

PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);

PDType1Font font = PDType1Font.HELVETICA;
contentStream.setFont(font, 12);
contentStream.beginText();
contentStream.moveTextPositionByAmount(100, 400);
contentStream.drawString("?");
contentStream.endText();
contentStream.close();
document.save("test.pdf");
document.close();

回答by gutch

You are using one of the inbuilt 'Base 14' fonts that are supplied with Adobe Reader. These fonts are not Unicode; they are effectively a standard Latin alphabet, though with a couple of extra characters. It looks like the character you mention, a lowercase s with a caron (?), is not available in PDF Latin text... though an uppercase ? isavailable but curiously on Windows only. See Appendix D of the PDF specification at http://www.adobe.com/devnet/pdf/pdf_reference.htmlfor details.

您正在使用 Adob​​e Reader 提供的内置“Base 14”字体之一。这些字体不是 Unicode;它们实际上是一个标准的拉丁字母表,尽管有几个额外的字符。看起来您提到的字符,带有 Caron (?) 的小写 s,在 PDF 拉丁文本中不可用...虽然大写 ? 可用的,但奇怪的只能在Windows上。有关详细信息,请参阅http://www.adobe.com/devnet/pdf/pdf_reference.html 上PDF 规范的附录 D。

Anyway, getting to the point... you need to embed a Unicode font if you want to use Unicode characters. Make sure you are licensed to embed whatever font you decide on... I can recommend the open-source Gentiumor Doulosfonts because they're free, high quality and have comprehensive Unicode support.

无论如何,切入正题……如果您想使用 Unicode 字符,则需要嵌入 Unicode 字体。确保您获得了嵌入您决定的任何字体的许可...我可以推荐开源的GentiumDoulos字体,因为它们是免费的、高质量的并且具有全面的 Unicode 支持。