Java -é 变成 ?? - 如何修复它

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16208517/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 22:13:23  来源:igfitidea点击:

Java - é becomes ?? - How to fix it

javaunicodecharacter-encoding

提问by user2172625

I have a folder tree in French. While I'm reading it's folders/files, it returns ?? instead of é. I replace the character, but it is not a good solution. How can I fix this ? I found some answers on google, but it doesn't help me.

我有一个法语文件夹树。当我阅读它的文件夹/文件时,它返回 ?? 而不是é。我替换了字符,但这不是一个好的解决方案。我怎样才能解决这个问题 ?我在谷歌上找到了一些答案,但对我没有帮助。

Thanks!

谢谢!

回答by Afriza N. Arief

when starting the application, set the encoding to utf-8:

启动应用程序时,将编码设置为 utf-8

java -Dfile.encoding="UTF-8" YourMainClass

Note that as mentioned in the link above, many Java classes cache the encoding; therefore if you change the encoding during run-time, it may not affect all of the classes that we are concerned.

请注意,如上面的链接所述,许多 Java 类都会缓存编码;因此,如果您在运行时更改编码,它可能不会影响我们关注的所有类。

Copying explanation from tchristin his answerto another question:

复制tchrist另一个问题的回答中的解释:

A \N{LATIN SMALL LETTER E WITH ACUTE}character is code point U+00E9. In UTF-8, that is \xC3\xA9.

But if you turn around and treat those two bytes as distinct code points U+00C3and U+00A9, those are \N{LATIN CAPITAL LETTER A WITH TILDE}and \N{COPYRIGHT SIGN}, respectively.

一个\N{LATIN SMALL LETTER E WITH ACUTE}字符是代码点U+00E9。在 UTF-8 中,即\xC3\xA9.

但是,如果您转过身来将这两个字节视为不同的代码点U+00C3and U+00A9,则它们分别是\N{LATIN CAPITAL LETTER A WITH TILDE}\N{COPYRIGHT SIGN}

回答by Walter Macambira

You are facing an encoding problem.

您正面临编码问题。

Any string is actually a set of bits. To make them readable, we use mappings of groups of bits to a character representation we can read. Those 'maps' represent what is called an encoding.

任何字符串实际上都是一组位。为了使它们可读,我们使用位组映射到我们可以读取的字符表示。这些“映射”代表所谓的编码。

The problem you are having is because you reading bits encoded using one 'map' and displaying it using another 'map'.

您遇到的问题是因为您读取使用一个“地图”编码的位并使用另一个“地图”显示它。

Be sure to use the same encoding and always check if your string manipulation functions work with the encoding being used. It is fundamental for proper working of your application.

确保使用相同的编码,并始终检查您的字符串操作函数是否适用于所使用的编码。它是您的应用程序正常工作的基础。

回答by Padrus

This typically) happens when you're not decoding the text in the right encoding format (probably UTF-8).

这通常)发生在您没有以正确的编码格式(可能是 UTF-8)解码文本时。

If you want a more precise answer, post us your code so we can try to correct it.

如果您想要更准确的答案,请将您的代码发布给我们,以便我们尝试更正。

回答by tchrist

The code is displaying the right bits — what is wrong is that the thing you are using to look at those bits has been told that the bits are in a different encoding than they actually are.

代码显示了正确的位——错误的是你用来查看这些位的东西被告知这些位的编码与实际不同。

This is not a Java problem. This is a problem with whatever software you are using to look at the Java output. For example, your Terminal encoding might be set to ISO-8859-15 rather than the UTF-8 that Java is emitting.

这不是 Java 问题。这是您用来查看 Java 输出的任何软件的问题。例如,您的终端编码可能设置为 ISO-8859-15 而不是 Java 发出的 UTF-8。

It really helps to have an all–UTF-8 workflow for the external world, and an internal world of abstract Unicode code points.

拥有一个面向外部世界的全 UTF-8 工作流程和一个包含抽象 Unicode 代码点的内部世界真的很有帮助。

I suppose it is possible that your are misreading some input, input that is in UTF-8 but which you are misreading as being in some legacy 8-bit encoding. But my best guess is the one already given, that your display device/program's encoding is mis-set.

我想您可能误读了某些输入,即 UTF-8 输入,但您误读了某些传统 8 位编码。但我最好的猜测是已经给出的,您的显示设备/程序的编码设置错误。

回答by karthikeyan paneerselvam

I have used below code to print éjava unicode to file is working

我使用下面的代码将éjava unicode打印到文件正在工作

writer1 = new FileWriter(outputFile, true);
writer2 = new BufferedWriter(writer1);
String str = new String(stringBuffer.toString().getBytes(), **"ISO-8859-1"**);
writer2.write(str);
writer1.flush();
writer2.flush();