避免在 Java 中打印 unicode 替换字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1832304/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-29 18:09:59  来源:igfitidea点击:

Avoid printing unicode replacement character in Java

javaunicodecharacterreplace

提问by akula1001

In Java, why does Character.toString((char) 65533)print out this symbol: ? ?

在Java中,为什么Character.toString((char) 65533)打印出这个符号:??

I have a Java program which prints these characters all over the place. Its a big program. Any ideas on what I can do to avoid this?

我有一个 Java 程序,可以到处打印这些字符。它是一个大程序。关于我可以做些什么来避免这种情况的任何想法?

回答by Paul Wagland

One of the most likely scenarios is that you are trying to read ISO-8859 data using the UTF-8 character set. If you come across a sequence of characters that is not valid UTF-8, then it will be replaced with the ? symbol.

最可能的情况之一是您尝试使用 UTF-8 字符集读取 ISO-8859 数据。如果您遇到一个无效的 UTF-8 字符序列,那么它将被替换为 ? 象征。

Check your input streams, and ensure that you read them using the correct character set.

检查您的输入流,并确保您使用正确的字符集读取它们。

回答by BalusC

In java, why does Character.toString((char) 65533) print out this symbol: ? ?

在java中,为什么Character.toString((char) 65533)会打印出这个符号:??

Because exact this particular character ISassociated with the particular codepoint. It does notdisplay a random character as you seem to think.

由于确切这种特殊的字符IS与特定相关的代码点。它不是你似乎想显示随机字符。

I have a java program which prints these characters all over the place. Its a big program. Any ideas on what I can do to avoid this?

我有一个 Java 程序,可以在所有地方打印这些字符。它是一个大程序。关于我可以做些什么来避免这种情况的任何想法?

Your problem lies somewhere else. It at least boils down that you should set every stepwhich involves byte-charconversions (storing text in file/db, reading text from file/db, manipulating text, transferring text, displaying text, etcetera) to use UTF-8.

你的问题出在别处。至少归结为您应该设置涉及的每个步骤byte-char转换(在文件/数据库中存储文本、从文件/数据库读取文本、操作文本、传输文本、显示文本等)以使用UTF-8.

Which catches my eye is the fact that Java does absolutely nothing special with 0xFFFD, it just replaces uncovered chars by a question mark ?and that while you keep insisting that 0xFFFDcomes from Java. I know that Firefox does exactly what you said, so are you maybe confusing "Firefox" with "Java"?

引起我注意的是,Java 对 绝对没有什么特别之处0xFFFD,它只是用问号替换未覆盖的字符?,而您一直坚持认为0xFFFD它来自 Java。我知道 Firefox 完全按照您说的做,所以您可能将“Firefox”与“Java”混淆了?

If this is true and you're actually talking about a Java webapplication, then you need to set at least the HTTP response encoding to UTF-8. You can do that by putting <%@ page pageEncoding="UTF-8" %>in top of the JSP page in question. You may find this articleuseful to get more background information and a detailed overview of all steps and solutions you need to apply to solve this "Unicode problem".

如果这是真的,并且您实际上是在谈论 Java Web 应用程序,那么您至少需要将 HTTP 响应编码设置为UTF-8. 您可以通过将<%@ page pageEncoding="UTF-8" %>问题放在JSP 页面的顶部来实现。您可能会发现本文有助于获取更多背景信息以及解决此“Unicode 问题”所需的所有步骤和解决方案的详细概述。

回答by MSalters

There is no Unicode character U+FFFD. Hence, the code is logically incorrect. The intended use of the Unicode Replacement Symbol is to be substitued for bad input (such as (char)65533).

没有 Unicode 字符 U+FFFD。因此,该代码在逻辑上是不正确的。Unicode 替换符号的预期用途是替换错误的输入(例如(char)65533)。

How to fix it: don't put junk in strings. Strings are for text. Bytes are for random binary data.

如何修复:不要在字符串中放入垃圾。字符串用于文本。字节用于随机二进制数据。

回答by Jon Skeet

Well, what do you wantit to do? If you're getting these characters "all over the place" I suspect you have bad data... it should be pretty rare that you receive data which can't be represented in Unicode.

嗯,你想让它做什么?如果您“到处都是”收到这些字符,我怀疑您有错误的数据……收到无法用 Unicode 表示的数据的情况应该很少见。

How are you getting the data to start with?

你是如何开始获取数据的?

回答by kem