Android Java UTF-8 HttpClient 问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4480363/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 17:56:10  来源:igfitidea点击:

Android Java UTF-8 HttpClient Problem

javaandroidhttpclient

提问by Michael Taggart

I am having weird character encoding issues with a JSON array that is grabbed from a web page. The server is sending back this header:

从网页中抓取的 JSON 数组出现奇怪的字符编码问题。服务器正在发回此标头:

Content-Type text/javascript; charset=UTF-8

内容类型文本/javascript;字符集=UTF-8

Also I can look at the JSON output in Firefox or any browser and Unicode characters display properly. The response will sometimes contain words from another language with accent symbols and such. However I am getting those weird question marks when I pull it down and put it to a string in Java. Here is my code:

我还可以查看 Firefox 或任何浏览器中的 JSON 输出和 Unicode 字符是否正确显示。响应有时会包含来自另一种语言的单词,带有重音符号等。但是,当我将其下拉并将其放入 Java 中的字符串时,我得到了那些奇怪的问号。这是我的代码:

HttpParams params = new BasicHttpParams();
HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
HttpProtocolParams.setContentCharset(params, "utf-8");
params.setBooleanParameter("http.protocol.expect-continue", false);

HttpClient httpclient = new DefaultHttpClient(params);

HttpGet httpget = new HttpGet("http://www.example.com/json_array.php");
HttpResponse response;
    try {
        response = httpclient.execute(httpget);

        if(response.getStatusLine().getStatusCode() == 200){
            // Connection was established. Get the content. 

            HttpEntity entity = response.getEntity();
            // If the response does not enclose an entity, there is no need
            // to worry about connection release

            if (entity != null) {
                // A Simple JSON Response Read
                InputStream instream = entity.getContent();
                String jsonText = convertStreamToString(instream);

                Toast.makeText(getApplicationContext(), "Response: "+jsonText, Toast.LENGTH_LONG).show();

            }

        }


    } catch (MalformedURLException e) {
        Toast.makeText(getApplicationContext(), "ERROR: Malformed URL - "+e.getMessage(), Toast.LENGTH_LONG).show();
        e.printStackTrace();
    } catch (IOException e) {
        Toast.makeText(getApplicationContext(), "ERROR: IO Exception - "+e.getMessage(), Toast.LENGTH_LONG).show();
        e.printStackTrace();
    } catch (JSONException e) {
        Toast.makeText(getApplicationContext(), "ERROR: JSON - "+e.getMessage(), Toast.LENGTH_LONG).show();
        e.printStackTrace();
    }

private static String convertStreamToString(InputStream is) {
    /*
     * To convert the InputStream to String we use the BufferedReader.readLine()
     * method. We iterate until the BufferedReader return null which means
     * there's no more data to read. Each line will appended to a StringBuilder
     * and returned as String.
     */
    BufferedReader reader;
    try {
        reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));
    } catch (UnsupportedEncodingException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    }
    StringBuilder sb = new StringBuilder();

    String line;
    try {
        while ((line = reader.readLine()) != null) {
            sb.append(line + "\n");
        }
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            is.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    return sb.toString();
}

As you can see, I am specifying UTF-8 on the InputStreamReader but every time I view the returned JSON text via Toast it has strange question marks. I am thinking that I need to send the InputStream to a byte[] instead?

如您所见,我在 InputStreamReader 上指定了 UTF-8,但每次我通过 Toast 查看返回的 JSON 文本时,它都会出现奇怪的问号。我在想我需要将 InputStream 发送到 byte[] 吗?

Thanks in advance for any help.

在此先感谢您的帮助。

采纳答案by Vit Khudenko

Try this:

尝试这个:

if (entity != null) {
    // A Simple JSON Response Read
    // InputStream instream = entity.getContent();
    // String jsonText = convertStreamToString(instream);

    String jsonText = EntityUtils.toString(entity, HTTP.UTF_8);

    // ... toast code here
}

回答by Stephen C

@Arhimed's answer is the solution. But I cannot see anything obviously wrong with your convertStreamToStringcode.

@Arhimed 的答案是解决方案。但是我看不出你的convertStreamToString代码有什么明显的错误。

My guesses are:

我的猜测是:

  1. The server is putting a UTF Byte Order Mark (BOM) at the start of the stream. The standard Java UTF-8 character decoder does not remove the BOM, so the chances are that it would end up in the resulting String. (However, the code for EntityUtils doesn't seem to do anything with BOMs either.)
  2. Your convertStreamToStringis reading the character stream a line at a time, and reassembling it using a hard-wired '\n'as the end-of-line marker. If you are going to write that to an external file or application, you should probably should be using a platform specific end-of-line marker.
  1. 服务器在流的开头放置一个 UTF 字节顺序标记 (BOM)。标准的 Java UTF-8 字符解码器不会删除 BOM,因此它很有可能会出现在结果字符串中。(但是,EntityUtils 的代码似乎也没有对 BOM 执行任何操作。)
  2. 您一次convertStreamToString读取一行字符流,并使用硬连线'\n'作为行尾标记重新组装它。如果您要将其写入外部文件或应用程序,您可能应该使用特定于平台的行尾标记。

回答by Win Myo Htet

It is just that your convertStreamToString is not honoring encoding set in the HttpRespnose. If you look inside EntityUtils.toString(entity, HTTP.UTF_8), you will see that EntityUtils find out if there is encoding set in the HttpResponse first, then if there is, EntityUtils use that encoding. It will only fall back to the encoding passed in the parameter(in this case HTTP.UTF_8) if there isn't encoding set in the entity.

只是您的 convertStreamToString 不遵守 HttpRespnose 中的编码集。如果你看里面EntityUtils.toString(entity, HTTP.UTF_8),你会看到 EntityUtils 首先找出 HttpResponse 中是否有编码集,如果有,EntityUtils 使用该编码。如果实体中没有设置编码,它只会回退到参数中传递的编码(在本例中为 HTTP.UTF_8)。

So you can say that your HTTP.UTF_8 is passed in the parameter but it never get used because it is the wrong encoding. So here is update to your code with the helper method from EntityUtils.

所以你可以说你的 HTTP.UTF_8 被传入参数但它从未被使用,因为它是错误的编码。所以这里是使用 EntityUtils 的辅助方法更新您的代码。

           HttpEntity entity = response.getEntity();
           String charset = getContentCharSet(entity);
           InputStream instream = entity.getContent();
           String jsonText = convertStreamToString(instream,charset);

    private static String getContentCharSet(final HttpEntity entity) throws ParseException {
    if (entity == null) {
        throw new IllegalArgumentException("HTTP entity may not be null");
    }
    String charset = null;
    if (entity.getContentType() != null) {
        HeaderElement values[] = entity.getContentType().getElements();
        if (values.length > 0) {
            NameValuePair param = values[0].getParameterByName("charset");
            if (param != null) {
                charset = param.getValue();
            }
        }
    }
    return TextUtils.isEmpty(charset) ? HTTP.UTF_8 : charset;
}



private static String convertStreamToString(InputStream is, String encoding) {
    /*
     * To convert the InputStream to String we use the
     * BufferedReader.readLine() method. We iterate until the BufferedReader
     * return null which means there's no more data to read. Each line will
     * appended to a StringBuilder and returned as String.
     */
    BufferedReader reader;
    try {
        reader = new BufferedReader(new InputStreamReader(is, encoding));
    } catch (UnsupportedEncodingException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    }
    StringBuilder sb = new StringBuilder();

    String line;
    try {
        while ((line = reader.readLine()) != null) {
            sb.append(line + "\n");
        }
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        try {
            is.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    return sb.toString();
}

回答by Alan Deep

Archimed's answer is correct. However, that can be done simply by providing an additional header in the HTTP request:

阿基米德的回答是正确的。但是,这可以通过在 HTTP 请求中提供额外的标头来完成:

Accept-charset: utf-8

No need to remove anything or use any other library.

无需删除任何内容或使用任何其他库。

For example,

例如,

GET / HTTP/1.1
Host: www.website.com
Connection: close
Accept: text/html
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.10 Safari/537.36
DNT: 1
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: utf-8

Most probably your request doesn't have any Accept-Charsetheader.

很可能您的请求没有任何Accept-Charset标头。

回答by Alex Goncalves

Extract the charset from the response content type field. You can use the following method to do this:

从响应内容类型字段中提取字符集。您可以使用以下方法来执行此操作:

private static String extractCharsetFromContentType(String contentType) {
    if (TextUtils.isEmpty(contentType)) return null;

    Pattern p = Pattern.compile(".*charset=([^\s^;^,]+)");
    Matcher m = p.matcher(contentType);

    if (m.find()) {
        try {
            return m.group(1);
        } catch (Exception e) {
            return null;
        }
    }

    return null;
}

Then use the extracted charset to create the InputStreamReader:

然后使用提取的字符集创建InputStreamReader

String charsetName = extractCharsetFromContentType(connection.getContentType());

InputStreamReader inReader = (TextUtils.isEmpty(charsetName) ? new InputStreamReader(inputStream) :
                    new InputStreamReader(inputStream, charsetName));
            BufferedReader reader = new BufferedReader(inReader);