Java HttpURLConnection 编码错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7865132/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 21:45:46  来源:igfitidea点击:

Wrong encoding with Java HttpURLConnection

javaweb-servicesencodinghttpurlconnectionurlconnection

提问by Shereef Marzouk

Trying to read a generated XML from a MS Webservice

尝试从 MS Webservice 读取生成的 XML

URL page = new URL(address);
StringBuffer text = new StringBuffer();
HttpURLConnection conn = (HttpURLConnection) page.openConnection();
conn.connect();
InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());
BufferedReader buff = new BufferedReader(in);
box.setText("Getting data ...");
String line;
do {
  line = buff.readLine();
  text.append(line + "\n");
} while (line != null);
box.setText(text.toString());

or

或者

URL u = new URL(address);
URLConnection uc = u.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(uc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {

    inputLine = java.net.URLDecoder.decode(inputLine, "UTF-8");
  System.out.println(inputLine);
}
in.close();

Any page reads fine except the web service output it reads the greater and less than signs strangely

除了 Web 服务输出之外,任何页面都可以正常读取,它会奇怪地读取大于和小于符号

it read < to "& lt;" and > to "& gt;" without spaces, but if i type them here without spaces stackoverflow makes them < and >

它读 < 到“& lt;” 和 > 到“& gt;” 没有空格,但如果我在这里输入它们没有空格 stackoverflow 会使它们 < 和 >

Please help thanks

请帮忙谢谢

回答by Martin Algesten

First there seem to be a confusion on this row:

首先,这一行似乎有些混乱:

inputLine = java.net.URLDecoder.decode(inputLine, "UTF-8");

This effectively says that you expect every row in the document that your server is providing to be URL encoded. URL encoding is not the same as document encoding.

这实际上表明您希望服务器提供的文档中的每一行都经过 URL 编码。URL 编码与文档编码不同。

http://en.wikipedia.org/wiki/Percent-encoding

http://en.wikipedia.org/wiki/Percent-encoding

http://en.wikipedia.org/wiki/Character_encoding

http://en.wikipedia.org/wiki/Character_encoding

Looking at your code snippet, I think URL encoding (percent encoding) is not what you're after.

查看您的代码片段,我认为 URL 编码(百分比编码)不是您所追求的。

In terms of documentcharacter encoding. You are making a conversion on this line:

文档字符编码方面。您正在此行进行转换:

InputStreamReader in = new InputStreamReader((InputStream) conn.getContent());

conn.getContent()returns an InputStreamthat operates on bytes, whilst the reader operates on chars - the character encoding conversion is done here. Checkout the other constructors of InputStreamReaderwhich takes the encoding as second argument. Without the second argument you are falling back on whatever is your platform default in java.

conn.getContent()返回InputStream对字节进行操作的 ,而读取器对字符进行操作 - 字符编码转换在此处完成。检查其他构造函数,InputStreamReader其中将编码作为第二个参数。如果没有第二个参数,您将使用 java 中的任何平台默认设置。

InputStreamReader(InputStream in, String charsetName)

for instance lets you change your code to:

例如让您将代码更改为:

InputStreamReader in = new InputStreamReader((InputStream) conn.getContent(), "utf-8");

But the real question will be "what encoding is your server providing the content in?" If you own the server code too, you may just hard code it to something reasonable such as utf-8. But if it can vary, you need to look at the http header Content-Typeto figure it out.

但真正的问题是“您的服务器提供内容的编码是什么?” 如果您也拥有服务器代码,则可以将其硬编码为合理的内容,例如utf-8. 但如果它可以变化,您需要查看 http 标头Content-Type才能弄清楚。

String contentType = conn.getHeaderField("Content-Type");

The contents of contentTypewill look like

的内容contentType看起来像

text/plain; charset=utf-8

A short hand way of getting this field is:

获取此字段的简便方法是:

String contentEncoding = conn.getContentEncoding();

Notice that it's entirely possible that no charset is provided, or no Content-Typeheader, in which case you must fall back on reasonable defaults.

请注意,完全有可能没有提供字符集或没有Content-Type标题,在这种情况下,您必须使用合理的默认值。

回答by Shereef Marzouk

Mark Rotteveel is correct, the webservice is the culprit here it's for some reason sending the greater than and less than sign with the & lt and & gt format

Mark Rotteveel 是正确的,网络服务是这里的罪魁祸首,它出于某种原因使用 < 和 > 格式发送大于和小于符号

Thanks Martin Algesten but i have already stated i worked around it i was just looking for why it was this way.

感谢 Martin Algesten,但我已经说过我已经解决了这个问题,我只是在寻找为什么会这样。