java UTF-8 编码问题像“é”这样的特殊字符没有正确复制
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17804808/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
UTF-8 encoding issue special characters like 'é' not replicated properly
提问by shilpa
I am trying to encode a string with special characters like 'é' using below mentioned code then it is not replicated properly...
我正在尝试使用下面提到的代码对带有特殊字符(如“é”)的字符串进行编码,然后它没有被正确复制......
String Cdata="MARIE-HéLèNE";
byte sByte[]=Cdata.getBytes();
Cdata= new String(sByte,"UTF-8");
System.out.println(Cdata);
expected output: MARIE-HéLèNE but instead output: MARIE-HE coming
预期输出:MARIE-HéLèNE 而是输出:MARIE-HE come
回答by Andreas Fester
First thing is that you need to make sure that your source file is actually stored as UTF-8
- see @Ankur's answer for a good explanation.
第一件事是您需要确保您的源文件实际上存储为UTF-8
- 请参阅@Ankur 的回答以获得一个很好的解释。
Then, you also need to provide an encoding when calling getBytes()
on String
to retrieve the byte array:
然后,您还需要提供编码时调用getBytes()
上String
检索字节数组:
byte sByte[] = Cdata.getBytes("UTF-8");
If you call String.getBytes()
with no encoding, the platform`s default encodingis used, which can be (almost) anything. See also java.lang.String.getBytes():
如果您String.getBytes()
不使用编码进行调用,则使用平台的默认编码,它可以(几乎)是任何内容。另见java.lang.String.getBytes():
Encodes this String into a sequence of bytes using the platform's default charset
使用平台的默认字符集将此字符串编码为字节序列
With that, the following SSCCE properly prints the expected output for me (note: took identifiers from question, not adjusted to coding conventions):
有了这个,下面的 SSCCE 正确地为我打印了预期的输出(注意:从问题中获取标识符,没有调整到编码约定):
import java.io.UnsupportedEncodingException;
public class Encoding {
public static void main(String[] args) throws UnsupportedEncodingException {
String Cdata = "MARIE-HéLèNE";
byte sByte[] = Cdata.getBytes("UTF-8");
Cdata = new String(sByte,"UTF-8");
System.out.println(Cdata);
}
}
回答by Ankur Lathi
You need to tell eclipse to use UTF-8 for its stdout console. You can set that by Window > Preferences > General > Workspace > Text File Encoding.
您需要告诉 eclipse 为其标准输出控制台使用 UTF-8。您可以通过 Window > Preferences > General > Workspace > Text File Encoding 进行设置。