java UTF-8 编码问题像“é”这样的特殊字符没有正确复制

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17804808/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-11-01 15:03:00  来源:igfitidea点击:

UTF-8 encoding issue special characters like 'é' not replicated properly

javastring

提问by shilpa

I am trying to encode a string with special characters like 'é' using below mentioned code then it is not replicated properly...

我正在尝试使用下面提到的代码对带有特殊字符(如“é”)的字符串进行编码,然后它没有被正确复制......

String Cdata="MARIE-HéLèNE";
byte sByte[]=Cdata.getBytes(); 
Cdata= new String(sByte,"UTF-8");
System.out.println(Cdata);

expected output: MARIE-HéLèNE but instead output: MARIE-HE coming

预期输出:MARIE-HéLèNE 而是输出:MARIE-HE come

回答by Andreas Fester

First thing is that you need to make sure that your source file is actually stored as UTF-8- see @Ankur's answer for a good explanation.

第一件事是您需要确保您的源文件实际上存储为UTF-8- 请参阅@Ankur 的回答以获得一个很好的解释。

Then, you also need to provide an encoding when calling getBytes()on Stringto retrieve the byte array:

然后,您还需要提供编码时调用getBytes()String检索字节数组:

byte sByte[] = Cdata.getBytes("UTF-8"); 

If you call String.getBytes()with no encoding, the platform`s default encodingis used, which can be (almost) anything. See also java.lang.String.getBytes():

如果您String.getBytes()不使用编码进行调用,则使用平台的默认编码,它可以(几乎)是任何内容。另见java.lang.String.getBytes()

Encodes this String into a sequence of bytes using the platform's default charset

使用平台的默认字符集将此字符串编码为字节序列

With that, the following SSCCE properly prints the expected output for me (note: took identifiers from question, not adjusted to coding conventions):

有了这个,下面的 SSCCE 正确地为我打印了预期的输出(注意:从问题中获取标识符,没有调整到编码约定):

import java.io.UnsupportedEncodingException;

public class Encoding {
   public static void main(String[] args) throws UnsupportedEncodingException {
      String Cdata = "MARIE-HéLèNE";
      byte sByte[] = Cdata.getBytes("UTF-8"); 
      Cdata = new String(sByte,"UTF-8");
      System.out.println(Cdata);
   }
}

回答by Ankur Lathi

You need to tell eclipse to use UTF-8 for its stdout console. You can set that by Window > Preferences > General > Workspace > Text File Encoding.

您需要告诉 eclipse 为其标准输出控制台使用 UTF-8。您可以通过 Window > Preferences > General > Workspace > Text File Encoding 进行设置。

enter image description here

在此处输入图片说明