java URLConnection 没有得到字符集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3934251/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 04:01:19  来源:igfitidea点击:

URLConnection does not get the charset

javahttpcontent-typehttpurlconnectionurlconnection

提问by Bart van Heukelom

I'm using URL.openConnection()to download something from a server. The server says

我正在使用URL.openConnection()从服务器下载一些东西。服务员说

Content-Type: text/plain; charset=utf-8

But connection.getContentEncoding()returns null. What up?

connection.getContentEncoding()返回null。怎么了?

采纳答案by Waldheinz

This is documented behaviour as the getContentEncoding()method is specified to return the contents of the Content-EncodingHTTP header, which is not set in your example. You could use the getContentType()method and parse the resulting String on your own, or possibly go for a more advancedHTTP client library like the one from Apache.

这是记录在案的行为,因为该getContentEncoding()方法被指定为返回Content-EncodingHTTP 标头的内容,这在您的示例中未设置。您可以使用该getContentType()方法并自行解析生成的 String,或者可能使用更高级的HTTP 客户端库,例如来自Apache 的库。

回答by Buhake Sindi

The value returned from URLConnection.getContentEncoding()returns the value from header Content-Encoding

从返回的值URLConnection.getContentEncoding()返回从报头中的值Content-Encoding

Code from URLConnection.getContentEncoding()

代码来自 URLConnection.getContentEncoding()

/**
     * Returns the value of the <code>content-encoding</code> header field.
     *
     * @return  the content encoding of the resource that the URL references,
     *          or <code>null</code> if not known.
     * @see     java.net.URLConnection#getHeaderField(java.lang.String)
     */
    public String getContentEncoding() {
       return getHeaderField("content-encoding");
    }

Instead, rather do a connection.getContentType()to retrieve the Content-Type and retrieve the charset from the Content-Type. I've included a sample code on how to do this....

相反,而是执行 aconnection.getContentType()来检索 Content-Type 并从 Content-Type 中检索字符集。我已经包含了一个关于如何做到这一点的示例代码......

String contentType = connection.getContentType();
String[] values = contentType.split(";"); // values.length should be 2
String charset = "";

for (String value : values) {
    value = value.trim();

    if (value.toLowerCase().startsWith("charset=")) {
        charset = value.substring("charset=".length());
    }
}

if ("".equals(charset)) {
    charset = "UTF-8"; //Assumption
}

回答by Juan M. Rivero

Just as an addition to the answer from @Buhake Sindi. If you are using Guava, instead of the manual parsing you can do:

就像对@Buhake Sindi 的回答的补充一样。如果您使用的是 Guava,您可以执行以下操作,而不是手动解析:

MediaType mediaType = MediaType.parse(httpConnection.getContentType());
Optional<Charset> typeCharset = mediaType.charset();