Java 使用 itext 将文本文件转换为 pdf 时设置编码

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21254628/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 07:41:28  来源:igfitidea点击:

Set encoding when converting text file to pdf using itext

javaitext

提问by Amira

I'm working on getting itext to output my UTF-8 encoded text correctly in fact the input file contains symbols like ° and Latin caracters (é,è,à...) .

我正在努力让 itext 正确输出我的 UTF-8 编码文本,实际上输入文件包含诸如 ° 和拉丁字符 (é,è,à...) 之类的符号。

But i didn't find a solution this is the code i'm using :

但我没有找到解决方案,这是我正在使用的代码:

BufferedReader input = null;
Document output = null;
System.out.println("Convert text file to pdf");
System.out.println("input  : " + args[0]);
System.out.println("output : " + args[1]);
try {
  // text file to convert to pdf as args[0]
  input = 
    new BufferedReader (new FileReader(args[0]));
  // letter 8.5x11
  //    see com.lowagie.text.PageSize for a complete list of page-size constants.
  output = new Document(PageSize.LETTER, 40, 40, 40, 40);
  // pdf file as args[1]
  PdfWriter.getInstance(output, new FileOutputStream (args[1]));

  output.open();
  output.addAuthor("RealHowTo");
  output.addSubject(args[0]);
  output.addTitle(args[0]);

  BaseFont courier = BaseFont.createFont(BaseFont.COURIER, BaseFont.CP1252, BaseFont.EMBEDDED);
  Font font = new Font(courier, 12, Font.NORMAL);
  Chunk chunk = new Chunk("",font);
  output.add(chunk); 

  String line = "";
  while(null != (line = input.readLine())) {
    System.out.println(line);
    Paragraph p = new Paragraph(line);
    p.setAlignment(Element.ALIGN_JUSTIFIED);
    output.add(p);
  }
  System.out.println("Done.");
  output.close();
  input.close();
  System.exit(0);
}
catch (Exception e) {
  e.printStackTrace();
  System.exit(1);
}
}

Any idea will be appreciated.

任何想法将不胜感激。

采纳答案by Bruno Lowagie

When I look at your code, I see a number of things that are odd.

当我查看您的代码时,我看到了许多奇怪的东西。

  1. You say you require UTF-8, but you create a BaseFontobject using BaseFont.CP1252instead of BaseFont.IDENTITY_H(which is the "encoding" you need when you work with Unicode).
  2. You use the standard Type 1 font Courier, which is a font that doesn't know how to render é,è,à... and a font that is never embedded. As documented, the BaseFont.EMBEDDEDparameter is ignored in this case!
  3. You don't use this font with an object that has actual content. The actual content is put into a Paragraphthat is created using the default font "Helvetica", a font that doesn't know how to render é,è,à...
  1. 您说您需要 UTF-8,但是您BaseFont使用BaseFont.CP1252而不是创建了一个对象BaseFont.IDENTITY_H(这是您在使用 Unicode 时需要的“编码”)。
  2. 您使用标准的 Type 1 字体 Courier,这是一种不知道如何呈现 é,è,à... 的字体以及一种从未嵌入的字体。正如所记录的,BaseFont.EMBEDDED在这种情况下该参数将被忽略!
  3. 您不要将此字体用于具有实际内容的对象。实际内容被放入Paragraph使用默认字体“Helvetica”创建的a中,该字体不知道如何呈现 é,è,à...

To solve this, you need to create the Paragraphwith the appropriate font. That is NOTa standard type 1 font, but something like courier.ttf. You also need to use the appropriate encoding: BaseFont.IDENTITY_H.

要解决此问题,您需要Paragraph使用适当的字体创建。那不是标准的 type 1 字体,而是类似于courier.ttf. 您还需要使用适当的编码:BaseFont.IDENTITY_H.

回答by Ivey

Both the reader and the writer should be set to use UTF-8 character set encoding to read/write UTF-8 characters properly. For example,

读取器和写入器都应设置为使用 UTF-8 字符集编码以正确读取/写入 UTF-8 字符。例如,

input = new BufferedReader(new InputStreamReader(args[0], "UTF-8"));

回答by Jo?o Zarate

@AmiraGL,

@AmiraGL,

The solution proposed by Bruno Lowagie corrected this(p:dataExporter PDF export does not show Euro () sign) my problem. It may be that also solves your.

Bruno Lowagie 提出的解决方案更正了我的问题(p:dataExporter PDF 导出不显示 Euro () 符号)。这可能也解决了你的问题。

To solve this, you need to create the Paragraph with the appropriate font. That is NOT a standard type 1 font, but something like courier.ttf. You also need to use the appropriate encoding: BaseFont.IDENTITY_H. -by Bruno Lowagie

要解决这个问题,您需要使用适当的字体创建段落。那不是标准的 type 1 字体,而是类似 courier.ttf 的字体。您还需要使用适当的编码:BaseFont.IDENTITY_H。——布鲁诺·洛瓦吉

BaseFont courier = BaseFont.createFont(BaseFont.COURIER, BaseFont.CP1252, BaseFont.EMBEDDED);
Font cellFont = new Font(courier, 12, Font.NORMAL);

Solution: https://stackoverflow.com/a/21259711/3557631

解决方案:https: //stackoverflow.com/a/21259711/3557631