人物 ”?”, ”?” “?” 在 Java 字符串中 (Windows)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19957431/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Characters "?", "?" "?" in Java Strings (Windows)
提问by Sing Sandibar
For some reason a String that is assigned the letter ?by using the Scanner class does not equal a String that is assigned ?by using the "normal" way: String a = "?"
- Why is that?
出于某种原因,分配了字母的字符串?通过使用 Scanner 类不等于分配的字符串?通过使用“正常”方式: String a = "?"
-为什么?
import java.util.*;
public class UTF8Test {
public static void main(String [] args) {
String [] Norge = {"l?k", "h?r", "v?r", "s?r", "s?t"};
Scanner input = new Scanner(System.in);
String test = input.nextLine(); //I enter l?k here
System.out.println(test);
System.out.println(Norge[0]);
for(int i = 0; i < Norge.length; i++) {
if(Norge[i].equals(test) ) {
System.out.println("YES!!");
}
}
}
}
The compiler will show this:
编译器会显示:
l?k
l?k
l?k
l?k
l├?k
l├?k
采纳答案by BalusC
Provided that your sole requirement is being able to use UTF-8 everywhereas indicated by the UTF8Test
classname, then your main mistake is that you're using Windows command console to compile and run your Java program. The ├?
as mojibakedform of ?
namely strongly suggests that you were using CP850encoding to compile your Java source code file. As evidence, run this in an UTF-8 capable environment:
如果您的唯一要求是能够按照类名的指示在任何地方使用 UTF-8 UTF8Test
,那么您的主要错误就是您使用 Windows 命令控制台来编译和运行您的 Java 程序。即的├?
as mojibaked形式?
强烈表明您正在使用CP850编码来编译您的 Java 源代码文件。作为证据,在支持 UTF-8 的环境中运行它:
System.out.println(new String("?".getBytes("UTF-8"), "CP850"));
This prints ├?
. This in turn strongly suggests that you were using Windows command console to compile your Java source code file as that's currently the only commonly used environment which uses CP850 by default. However, the Windows command console is notUTF-8 capable.
这打印├?
. 这反过来强烈建议您使用 Windows 命令控制台来编译您的 Java 源代码文件,因为这是目前唯一默认使用 CP850 的常用环境。但是,Windows 命令控制台不支持UTF-8。
When you save (convert from chars to bytes) the source code file using UTF-8 encoding in your text editor, then the ?
character is turned into the bytes 0xC3
and 0xB8
(as evidence, see "UTF-8 (hex)" entry in U+00F8 character info). When you run javac UTF8Test.java
, then the UTF-8 saved source code file is basically read (converted from bytes to characters) using CP850 encoding. The bytes 0xC3
and 0xB8
represent in this encoding the characters ├
and ?
(as evidence, see CP850 codepage layout). This totally explains your initial problem.
当您在文本编辑器中使用 UTF-8 编码保存(从字符转换为字节)源代码文件时,?
字符将转换为字节0xC3
和0xB8
(作为证据,请参阅U+ 中的“UTF-8(十六进制)”条目) 00F8 字符信息)。当你运行 时javac UTF8Test.java
,那么 UTF-8 保存的源代码文件基本上是使用 CP850 编码读取(从字节转换为字符)。字节0xC3
和0xB8
在此编码中表示字符├
和?
(作为证据,请参阅CP850 代码页布局)。这完全解释了你最初的问题。
True, you can instruct javac
to read the source code file using UTF-8 by the -encoding UTF-8
argument. However, the Windows command console at its whole own does not support UTF-8 flavored input and output at all. When you recompile using -encoding UTF-8
, then you would still get mojibaked output because the command console can't properly represent UTF-8 output. I tried it here and I got a degree symbol instead:
没错,您可以javac
通过-encoding UTF-8
参数指示使用 UTF-8 读取源代码文件。但是,Windows 命令控制台本身根本不支持 UTF-8 风格的输入和输出。当您使用 重新编译时-encoding UTF-8
,您仍然会得到 mojibaked 输出,因为命令控制台无法正确表示 UTF-8 输出。我在这里尝试过,但得到了学位符号:
l?k l°k
This problem is not solveable if you intend to use UTF-8 everywhere andwant to stick to Windows command console as input/output environment. Basically, you need an UTF-8 capable input/output environment. Decent IDEs like Eclipse and Netbeans are such ones. Or, if you intend to run it as an UTF-8 capable standalone program, using a Swing UIshould be preferred over a GUI-less console program.
如果您打算在任何地方都使用 UTF-8,并且希望坚持使用 Windows 命令控制台作为输入/输出环境,则此问题无法解决。基本上,您需要一个支持 UTF-8 的输入/输出环境。Eclipse 和 Netbeans 等体面的 IDE 就是这样的。或者,如果您打算将它作为支持 UTF-8 的独立程序运行,则应该优先使用Swing UI,而不是无 GUI 的控制台程序。
回答by LordOfThePigs
By default on windows, the java compiler interprets all of its source file using the "platform default encoding". Depending on which environment you are running the compiler, this may be ISO-8859-1, CP1252, UTF-8 or any other encoding really.
默认情况下,在 Windows 上,java 编译器使用“平台默认编码”解释其所有源文件。根据您运行编译器的环境,这可能是 ISO-8859-1、CP1252、UTF-8 或任何其他编码。
If the editor you are using is actually encoding your java source files using UTF-8, but the compiler is reading those source files using another encoding, then the contents of all your hardcoded string may potentially be screwed (as you have experienced). To fix this problem, either make sure you save your java source file in the "platform default encoding", or setup your java compiler to interpret the source files as UTF-8.
如果您使用的编辑器实际上是使用 UTF-8 对 Java 源文件进行编码,但编译器正在使用另一种编码读取这些源文件,那么所有硬编码字符串的内容可能会被搞砸(正如您所经历的那样)。要解决此问题,请确保将 Java 源文件保存在“平台默认编码”中,或者将 Java 编译器设置为将源文件解释为 UTF-8。
try calling your compiler with javac -encoding UTF-8 UTF8Test.java
. Make sure you replace UTF-8 with whatever your editor is using to save your source file, if necessary.
尝试使用javac -encoding UTF-8 UTF8Test.java
. 如有必要,请确保将 UTF-8 替换为编辑器用来保存源文件的任何内容。
回答by AJMansfield
If you want to have a string literal with a special character, you can try using a Unicode escape:
如果你想要一个带有特殊字符的字符串文字,你可以尝试使用 Unicode 转义:
String [] Norge = {"l\u00F8k", "h\u00E5r", "v\u00E5r", "s\u00E6r", "s\u00F8t"};
While it is not wrong to include special characters in source code (at least in java), it can in some cases cause problems with poorly configured editors, compilers, or terminals; Personally I steer clear of using special characters at all if I can.
虽然在源代码中包含特殊字符并没有错(至少在 java 中),但在某些情况下,它可能会导致编辑器、编译器或终端配置不当的问题;如果可以的话,我个人完全避免使用特殊字符。
Incidentally, you can also use Unicode escapes elsewhere in java source code, including javadoc comments, and class, method, and variable names.
顺便说一下,您还可以在 java 源代码的其他地方使用 Unicode 转义,包括 javadoc 注释以及类、方法和变量名称。
If you are compiling from the command line, you can configure the compiler to accept UTF-8 by using the -encoding
option with UTF-8
as its parameter. Like so:
如果您是从命令行编译,则可以使用-encoding
withUTF-8
作为参数将编译器配置为接受 UTF-8 。像这样:
javac -encoding UTF-8 ...
You may also find this question useful: Special Character in Java
您可能还会发现这个问题很有用:Java 中的特殊字符
You might consider externalizing the strings, as an alternate way to solve the problem. Eclipse provides a way to automatically do this, but it basically just takes all the literal strings, puts them in a separate file, and reads from that file to get the appropriate string. This also allows you to create a translation of the program, by making a different file with translations of all the strings, or to reconfigure application messages without having to recompile.
您可以考虑将 strings 外部化,作为解决问题的替代方法。Eclipse 提供了一种自动执行此操作的方法,但它基本上只是获取所有文字字符串,将它们放在一个单独的文件中,然后从该文件中读取以获得适当的字符串。这还允许您创建程序的翻译,通过制作包含所有字符串翻译的不同文件,或重新配置应用程序消息而无需重新编译。
EDIT: I just tried compiling and running it myself (in eclipse), and I did not have the problem with it you mention. It is therefore likely an issue with your particular setup.
编辑:我只是尝试自己编译和运行它(在 eclipse 中),我没有你提到的问题。因此,您的特定设置可能存在问题。
When I reconfigured it to compile the code as US-ASCII, it output l?k
both times.
当我重新配置它以将代码编译为 US-ASCII 时,它会输出l?k
两次。
When I reconfigured it to compile the code as UTF-8, the output was l??k
and l?k
.
当我重新配置它以将代码编译为 UTF-8 时,输出是l??k
和l?k
。
When I compiled it as UTF-16, the output was t? l ? k
and l?k
, however I could not copy the blank spaces in t? l ? k
from the terminal: it would let me copy the first two, but leave off the rest. This is probably related to the issue you were having - they could be some control characters that are messing it up in your case.
当我将它编译为 UTF-16 时,输出是t? l ? k
and l?k
,但是我无法t? l ? k
从终端复制空格:它可以让我复制前两个,但不考虑其余部分。这可能与您遇到的问题有关 - 它们可能是一些控制字符,在您的情况下将其搞砸了。