Java 中的 HTTP 标头编码/解码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/324470/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
HTTP headers encoding/decoding in Java
提问by ebruchez
A custom HTTP header is being passed to a Servlet application for authentication purposes. The header value must be able to contain accents and other non-ASCII characters, so must be in a certain encoding (ideally UTF-8).
自定义 HTTP 标头被传递到 Servlet 应用程序以进行身份验证。标头值必须能够包含重音符号和其他非 ASCII 字符,因此必须采用某种编码(最好是 UTF-8)。
I am provided with this piece of Java code by the developers who control the authentication environment:
控制身份验证环境的开发人员向我提供了这段 Java 代码:
String firstName = request.getHeader("my-custom-header");
String decodedFirstName = new String(firstName.getBytes(),"UTF-8");
But this code doesn't look right to me: it presupposes the encoding of the header value, when it seemed to me that there was a proper way of specifying an encoding for header values (from MIME I believe).
但是这段代码在我看来并不正确:它以标头值的编码为前提,而在我看来,有一种正确的方法可以指定标头值的编码(我相信来自 MIME)。
Here is my question: what is the right way (tm) of dealing with custom header values that need to support a UTF-8 encoding:
这是我的问题:处理需要支持 UTF-8 编码的自定义标头值的正确方法 (tm) 是什么:
- on the wire (how the header looks like over the wire)
- from the decoding point of view (how to decode it using the Java Servlet API, and can we assume that request.getHeader() already properly does the decoding)
- 在电线上(标头在电线上的样子)
- 从解码的角度来看(如何使用 Java Servlet API 对其进行解码,我们是否可以假设 request.getHeader() 已经正确地进行了解码)
Here is an environment independent code sample to treat headers as UTF-8 in case you can't change your service:
这是一个独立于环境的代码示例,用于将标头视为 UTF-8,以防您无法更改您的服务:
String valueAsISO = request.getHeader("my-custom-header");
String valueAsUTF8 = new String(firstName.getBytes("ISO8859-1"),"UTF-8");
回答by superfell
See the HTTP specfor the rules, which says in section 2.2
有关规则,请参阅HTTP 规范,在第 2.2 节中说明
The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO- 8859-1 [22] only when encoded according to the rules of RFC 2047 [14].
TEXT 规则仅用于不打算由消息解析器解释的描述性字段内容和值。仅当根据 RFC 2047 [14] 的规则进行编码时,*TEXT 的单词才可能包含来自除 ISO-8859-1 [22] 以外的字符集的字符。
The above code will not correctly decode an RFC2047 encoding string, leading me to believe that the service doesn't correctly follow the spec, and they just embeding raw utf-8 data in the header.
上面的代码无法正确解码 RFC2047 编码字符串,这让我相信该服务没有正确遵循规范,他们只是在标头中嵌入了原始 utf-8 数据。
回答by mkoeller
As mentioned already the first look should always go to the HTTP 1.1 spec(RFC 2616). It saysthat text in header values must use the MIME encoding as defined RFC 2047if it contains characters from character sets other than ISO-8859-1.
正如已经提到的,首先应该查看HTTP 1.1 规范(RFC 2616)。它表示,如果标头值中的文本包含来自除 ISO-8859-1 以外的字符集的字符,则它必须使用定义为RFC 2047的 MIME 编码。
So here's a plus for you. If your requirements are covered by the ISO-8859-1 charset then you just put your characters into your request/response messages. Otherwise MIME encoding is the only alternative.
所以这对你来说是一个加分项。如果 ISO-8859-1 字符集涵盖了您的要求,那么您只需将您的字符放入请求/响应消息中。否则 MIME 编码是唯一的选择。
As long as the user agent sends the values to your custom headers according to these rules you wont have to worry about decoding them. That's what the Servlet API should do.
只要用户代理根据这些规则将值发送到您的自定义标头,您就不必担心解码它们。这就是 Servlet API 应该做的。
However, there's a more basic reason why your code sniplet doesn't do what it's supposed to. The first line fetches the header value as a Java string. As we know it's represented as UTF8 internally so at this point the HTTP request message parsing is already done and finished.
但是,还有一个更基本的原因,为什么您的代码片段没有按预期执行。第一行以 Java 字符串的形式获取标头值。正如我们所知,它在内部表示为 UTF8,因此此时 HTTP 请求消息解析已经完成并完成。
The next line fetches the byte array of this string. Since no encoding was specified (IMHO this method with no argument should have been deprecated long ago), the current system default encoding is used, which is usually not UTF8 and then the array is again converted as being UTF8 encoded. Outch.
下一行获取该字符串的字节数组。由于没有指定编码(恕我直言,这种没有参数的方法早就应该被弃用),因此使用当前系统默认编码,通常不是 UTF8,然后数组再次转换为 UTF8 编码。输出。
回答by ebruchez
Thanks for the answers. It seems that the ideal would be to follow the proper HTTP header encoding as per RFC 2047. Header values in UTF-8 on the wire would look something like this:
感谢您的回答。似乎理想的是按照 RFC 2047 遵循正确的 HTTP 标头编码。 线路上的 UTF-8 标头值将如下所示:
=?UTF-8?Q?...?=
Now here is the funny thing: it seems that neither Tomcat 5.5 or 6 properly decodes HTTP headers as per RFC 2047! The Tomcat code assumes everywhere that header values use ISO-8859-1.
现在有趣的是:Tomcat 5.5 或 6 似乎都没有按照 RFC 2047 正确解码 HTTP 标头!Tomcat 代码假定标头值在任何地方都使用 ISO-8859-1。
So for Tomcat, specifically, I will work around this by writing a filter which handles the proper decoding of the header values.
因此,对于 Tomcat,特别是,我将通过编写一个处理正确解码标头值的过滤器来解决这个问题。
回答by Julian Reschke
The HTTPbis working group is aware of the issue, and the latest drafts get rid of all the language with respect to TEXT and RFC 2047 encoding -- it is not used in practice over HTTP.
HTTPbis 工作组意识到了这个问题,最新的草案删除了与 TEXT 和 RFC 2047 编码相关的所有语言——它实际上并未通过 HTTP 使用。
See http://trac.tools.ietf.org/wg/httpbis/trac/ticket/74for the whole story.
有关整个故事,请参阅http://trac.tools.ietf.org/wg/httpbis/trac/ticket/74。
回答by Julian Reschke
Again: RFC 2047 is not implemented in practice. The next revision of HTTP/1.1 is going to remove any mention of it.
再次强调:RFC 2047 并未在实践中实施。HTTP/1.1 的下一个修订版将删除对它的任何提及。
So, if you need to transport non-ASCII characters, the safest way is to encode them into a sequence of ASCII, such as the "Slug" header in the Atom Publishing Protocol.
因此,如果您需要传输非 ASCII 字符,最安全的方法是将它们编码为 ASCII 序列,例如 Atom 发布协议中的“Slug”标头。