java 使用 HttpClient 3.1 设置响应编码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5142794/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Set response encoding with HttpClient 3.1
提问by michal.kreuzman
I'm using org.apache.commons.httpclient.HttpClient
and need to setup response encoding (for some reason server returns incorrect encoding in Content-Type). My way is to get response as raw bytes and convert to String
with desired encoding. I'm wondering if there is some better way to do this (eg. setup HttpClient). Thanks for suggestions.
我正在使用org.apache.commons.httpclient.HttpClient
并且需要设置响应编码(出于某种原因服务器在 Content-Type 中返回不正确的编码)。我的方法是将响应作为原始字节并转换为String
所需的编码。我想知道是否有更好的方法来做到这一点(例如设置 HttpClient)。感谢您的建议。
采纳答案by Stephen C
I don't think there's a better answer using HttpClient
3.x APIs.
我认为使用HttpClient
3.x API没有更好的答案。
The HTTP 1.1 spec says clearly that a client "must" respect the character set specified in the response header, and use ISO-8859-1 if no character set is specified. The HttpClient
APIs are designed on the assumption that the programmer wants to conform to the HTTP specs. Obviously, you need to break the rules in the spec so that you can talk to the non-compliant server. Not withstanding, this is not a use-case that the API designers saw a need to support explicitly.
HTTP 1.1 规范明确指出,客户端“必须”遵守响应标头中指定的字符集,如果未指定字符集,则使用 ISO-8859-1。该HttpClient
API的设计上,程序员要符合HTTP规范的假设。显然,您需要打破规范中的规则,以便您可以与不合规的服务器对话。尽管如此,这并不是 API 设计人员认为需要明确支持的用例。
If you were using the HttpClient
4.x, you could write your own ResponseHandler
to convert the body into an HttpEntity
, ignoring the response message's notional character set.
如果您使用的是HttpClient
4.x,您可以自己编写ResponseHandler
将正文转换为HttpEntity
,忽略响应消息的名义字符集。
回答by Peter Knego
A few notes:
一些注意事项:
Server serves data, so it's up to server to serve it in an appropriate format. So response encoding is set by server not client. However, client could suggest to server what format it would like via Accept and Accept-Charset:
Accept: text/plain Accept-Charset: utf-8
However, http servers usually do not convert between formats.
If option 1. does not work, then you should look at the configuration of the server.
When String is sent as raw bytes (and it always is, because this is what networks transmit), there is always the encoding defined. Since server produces this raw bytes, it defines the encoding. So, you can not take raw bytes and use encoding of your choice to create a String. You must use encoding that was used when converted from String to bytes.
服务器提供数据,因此由服务器以适当的格式提供数据。所以响应编码是由服务器而不是客户端设置的。但是,客户端可以通过Accept 和 Accept-Charset向服务器建议它想要的格式:
Accept: text/plain Accept-Charset: utf-8
但是,http 服务器通常不会在格式之间进行转换。
如果选项 1. 不起作用,那么您应该查看服务器的配置。
当 String 作为原始字节发送时(它总是如此,因为这是网络传输的内容),总是有定义的编码。由于服务器产生这个原始字节,它定义了编码。因此,您不能使用原始字节并使用您选择的编码来创建字符串。您必须使用从字符串转换为字节时使用的编码。
回答by Pa?lo Ebermann
Disclaimer: I'm not really knowing HttpClient, only reading the API.
免责声明:我并不真正了解 HttpClient,只是阅读了 API。
I would use the execute method returning a HttpResponse, then .getEntity().getContent()
. This is a pure byte stream, so if you want to ignore the encoding told by the server, you can simply wrap your own InputStreamReader around it.
我会使用 execute 方法返回一个 HttpResponse,然后.getEntity().getContent()
. 这是一个纯字节流,所以如果你想忽略服务器告诉的编码,你可以简单地将自己的 InputStreamReader 包裹起来。
Okay, looks like I had the wrong version (obviously, there are too much HttpClient
classes out there).
好的,看起来我的版本有误(显然,那里的HttpClient
类太多了)。
But same as before, just located on other classes: the HttpMethod
has a getResponseBodyAsStream()
method, around which you can now wrap your own InputStreamReader. (Or get the whole array at once, if it is not too big, and convert it to String, as you wrote.)
但是和以前一样,只是位于其他类上:HttpMethod
有一个getResponseBodyAsStream()
方法,您现在可以围绕它包装自己的 InputStreamReader。(或者一次获取整个数组,如果它不是太大,并将其转换为 String,如您所写。)
I think trying to change the response and letting the HttpClient analyze it is not the right way here.
我认为尝试更改响应并让 HttpClient 分析它在这里不是正确的方法。
I suggest sending a message to the server administrator/webmaster about the wrong charset, though.
不过,我建议向服务器管理员/网站管理员发送关于错误字符集的消息。
回答by HommeDeJava
Greetings folks,
问候各位,
Jus in case someone finds this post googling for setting HttpClient to write in UTF-8.
以防万一有人发现这篇文章在谷歌上搜索设置 HttpClient 以 UTF-8 编写。
This line of code should be handy...
这行代码应该很方便...
response.setContentType("text/html; charset=UTF-8");
Best
最好的