javascript 在 PDF 中显示 UTF-8 字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16040836/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-27 03:01:17  来源:igfitidea点击:

Displaying UTF-8 characters in PDF

javascriptpdfutf-8character-encodingsap

提问by Themasterhimself

I am trying to display a PDF by converting it into a binary string from the backend. This is the ajax call I am making

我试图通过从后端将其转换为二进制字符串来显示 PDF。这是我正在进行的 ajax 调用

    $.ajax({
        type : 'GET',
        url : '<url>',          
        data : oParameters,
        contentType : 'application/pdf;charset=UTF-8',
        success : function(odata) {

            window.open("data:application/pdf;charset=utf-8," + escape(odata));

} });

} });

When I try to open the PDF in a new window, the url looks like

当我尝试在新窗口中打开 PDF 时,网址如下所示

data:application/pdf;charset=utf-8,%25PDF-1.3%0D%0A%25%uFFFD%uFFFD%uFFFD%uFFFD%0D%0A2%200%20obj%0D%0A/WinAnsiEncoding%0D........

数据:application/pdf;charset=utf-8,%25PDF-1.3%0D%0A%25%uFFFD%uFFFD%uFFFD%uFFFD%0D%0A2%200%20obj%0D%0A/WinAnsiEncoding%0D.... ....

As you can see, it uses "WinAnsiEncoding" to display the PDF. Because of this, some of the characters are not being displayed properly. How do I change this to UTF-8?

如您所见,它使用“WinAnsiEncoding”来显示 PDF。因此,某些字符无法正确显示。如何将其更改为 UTF-8?

EDIT : The backend is in ABAP. I am converting a smartform to OTF and then to a string using the function module "CONVERT_OTF".

编辑:后端在 ABAP 中。我正在将 smartform 转换为 OTF,然后使用功能模块“CONVERT_OTF”转换为字符串。

           CALL FUNCTION fname
         EXPORTING
           user_settings      = space
           control_parameters = ls_ctropt
           output_options     = ls_output
           gv_lang            = lv_lang
         IMPORTING
           job_output_info    = ls_body_text
         EXCEPTIONS
           formatting_error   = 1
           internal_error     = 2
           send_error         = 3
           user_canceled      = 4
           OTHERS             = 5.

CALL FUNCTION 'CONVERT_OTF'
          EXPORTING
             format                = 'PDF' 
          IMPORTING
           bin_filesize          = ls_pdf_len
           bin_file              = ls_pdf_xstring
          TABLES
             otf                   = ls_body_text-otfdata
             lines                 = lt_lines
           EXCEPTIONS
             err_max_linewidth     = 1
             err_format            = 2
             err_conv_not_possible = 3
             err_bad_otf           = 4
             OTHERS                = 5.
   CALL METHOD server->response->set_header_field( name = 'Content-Type'
     value = 'application/pdf;charset=UTF-8' ).
   CALL METHOD server->response->append_data( data = lv_pdf_string
     length = lv_len ).

回答by mkl

Concerning your remark that it uses "WinAnsiEncoding" to display the PDF:

关于您说它使用“WinAnsiEncoding”来显示 PDF 的评论

After the comma in

在逗号之后

data:application/pdf;charset=utf-8,%25PDF-1.3%0D%0A%25%uFFFD%uFFFD%uFFFD%uFFFD%0D%0A2%200%20obj%0D%0A/WinAnsiEncoding%0D........

everything is pure data.Thus, "WinAnsiEncoding" is merely part of the content of the PDF, and if it is the reason of your troubles, the PDF generator must be asked to change his PDF generation process.

一切都是纯数据。因此,“WinAnsiEncoding”只是PDF内容的一部分,如果是您的麻烦,必须要求PDF生成器更改其PDF生成过程。

In the case at hand, your data is:

在手头的情况下,您的数据是:

%PDF-1.3
%...
2 0 obj
/WinAnsiEncoding
........

which is completely normal PDF structure. It merely means that the PDF object 2 is defined as /WinAnsiEncodingwhich may or may not be used for some font definition, and even if it is used, it may still be adapted by some /Differencesto include the characters you require. Furthermore it does not make sense to change this to UTF-8(as you request) because UTF-8 is not a standard encoding for PDF page content. If you somehow put UTF-8there, you'll break the PDF even more.

这是完全正常的PDF结构。它仅仅意味着 PDF 对象 2 被定义为/WinAnsiEncoding可以或不可以用于某些字体定义,并且即使使用它,它仍然可能被某些/Differences 修改以包含您需要的字符。此外,将其更改为 UTF-8(根据您的要求)是没有意义的,因为 UTF-8 不是 PDF 页面内容的标准编码。如果你以某种方式把UTF-8它放在那里,你会破坏更多的 PDF。

I'm afraid, though, that there are other problems, too.

不过,恐怕还有其他问题。

  1. You add a charsetparameter to the type application/pdf--- this does not make sense, PDF is a binary format, i.e. a sequence of bytes is expected and, therefore, no charset is involved.

  2. Your method call escape(odata)creates %uFFFD%uFFFD%uFFFD%uFFFD--- this is invalid according to the RFCs which only define

    A percent-encoding mechanism is used to represent a data octet in a component when that octet's corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component. A percent-encoded octet is encoded as a character triplet, consisting of the percent character "%" followed by the two hexadecimal digits representing that octet's numeric value.

    (RFC 3986, section 2.1)

    Because the percent ("%") character serves as the indicator for percent-encoded octets, it must be percent-encoded as "%25" for that octet to be used as data within a URI.

    (ibidem, section 2.4)

    Thus, %uFFFD%uFFFD%uFFFD%uFFFDis invalid.

  3. PDF being a binary format are better suited for Base64 encoding, i.e.

    data:application/pdf;base64,BASE_64_ENCODED_PDF
    

    Thus, I propose you change your client side process accordingly.

  1. 您将字符集参数添加到类型application/pdf--- 这没有意义,PDF 是一种二进制格式,即需要一个字节序列,因此不涉及字符集。

  2. 您的方法调用escape(odata)创建%uFFFD%uFFFD%uFFFD%uFFFD---根据仅定义的 RFC,这是无效的

    当八位字节的相应字符在允许集之外或用作组件的分隔符或组件内时,百分比编码机制用于表示组件中的数据八位字节。百分比编码的八位字节被编码为一个字符三元组,由百分比字符“%”和代表该八位字节数值的两个十六进制数字组成。

    RFC 3986,第 2.1 节)

    因为百分比 ("%") 字符用作百分比编码的八位字节的指示符,所以它必须被百分比编码为 "%25" 才能将该八位字节用作 URI 中的数据。

    同上,第 2.4 节)

    因此,%uFFFD%uFFFD%uFFFD%uFFFD无效。

  3. 作为二进制格式的 PDF 更适合 Base64 编码,即

    data:application/pdf;base64,BASE_64_ENCODED_PDF
    

    因此,我建议您相应地更改客户端流程。