java windows-1252 到 UTF-8

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28484064/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-02 13:40:27  来源:igfitidea点击:

windows-1252 to UTF-8

javacharacter-encoding

提问by Sharique

Below is the code I am trying to use, and the output it's giving me is:

下面是我尝试使用的代码,它给我的输出是:

RetValue: á, é, í, ó, ú, ü, ?, ? Value: á, é, í, ó, ú, ü, ?, ? ConvertValue: ?, ?, ?, ?, ?, ?, ?, ?

which is not the desired output. I think the output should be something of this kind %C3% for every character here.

这不是所需的输出。我认为这里的每个字符的输出都应该是这种 %C3% 。

public static void main(String[] args) {
    String value = "á, é, í, ó, ú, ü, ?, ?";
    String retValue = "";
    String convertValue = "";
    try {
        retValue = new String(value.getBytes(),
        Charset.forName("Windows-1252"));
        convertValue = new String(retValue.getBytes("Windows-1252"),
        Charset.forName("UTF-8"));
    } catch (Exception e) {
        e.printStackTrace();
    }
    System.out.println("RetValue: " + retValue + " Value: " + value
         + " ConvertValue: " + convertValue);
}

回答by alainlompo

I understand that you are trying to encode your text from default encoding to Windows-1252, then to UTF-8.

我了解您正在尝试将文本从默认编码编码为 Windows-1252,然后编码为 UTF-8。

According to the javadoc for the Stringclass

根据String类的javadoc

String(byte[] bytes, Charset charset)

Constructs a new String by decoding the specified array of bytes using the specified charset.

String(byte[] bytes, Charset charset)

通过使用指定的字符集解码指定的字节数组来构造一个新的 String。

Therefore what you did was to decode a default encoded text into Windows-1252 and then further decode the newly obtained text into UTF-8. That's why it renders something abnormal.

因此,您所做的是将默认编码文本解码为 Windows-1252,然后将新获得的文本进一步解码为 UTF-8。这就是为什么它呈现异常的原因。

If your purpose is to encode from Windows-1252 to UTF-8, I would suggest that you use the following approach with CharsetEncoderin java.niopackage:

如果您的目的是将 Windows-1252 编码为 UTF-8,我建议您CharsetEncoderjava.nio包中使用以下方法:

public static void main(String[] args) {
    String value = "á, é, í, ó, ú, ü, ?, ?";
    String retValue = "";
    String convertValue2 = "";
    ByteBuffer convertedBytes = null;
    try {
        CharsetEncoder encoder2 = Charset.forName("Windows-1252").newEncoder();
        CharsetEncoder encoder3 = Charset.forName("UTF-8").newEncoder();             
        System.out.println("value = " + value);

        assert encoder2.canEncode(value);
        assert encoder3.canEncode(value);

        ByteBuffer conv1Bytes = encoder2.encode(CharBuffer.wrap(value.toCharArray()));

        retValue = new String(conv1Bytes.array(), Charset.forName("Windows-1252"));

        System.out.println("retValue = " + retValue);

        convertedBytes = encoder3.encode(CharBuffer.wrap(retValue.toCharArray()));
        convertValue2 = new String(convertedBytes.array(), Charset.forName("UTF-8"));
        System.out.println("convertedValue =" + convertValue2);
    } catch (Exception e) {
        e.printStackTrace();
    }
}

I obtained the following output:

我获得了以下输出:

value = á, é, í, ó, ú, ü, ?, ?

retValue = á, é, í, ó, ú, ü, ?, ?

convertedValue =á, é, í, ó, ú, ü, ?, ?

值 = á, é, í, ó, ú, ü, ?, ?

retValue = á, é, í, ó, ú, ü, ?, ?

转换值 =á, é, í, ó, ú, ü, ?, ?