Java Charset.forName("ASCII") 或 Charset.forName("US-ASCII")

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32063929/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 12:00:24  来源:igfitidea点击:

Java Charset.forName("ASCII") or Charset.forName("US-ASCII")

javacharacter-encoding

提问by

I was going through a code and came across the following line.

我正在浏览代码并遇到以下行。

Charset.forName("ASCII")

But when I looked at the java documentationit only has

但是当我查看 java文档时,它只有

US-ASCII    ISO-8859-1    UTF-8   UTF-16BE   UTF-16LE   UTF-16  

But the code works. Are 'ASCII' and 'US-ASCII' are synonyms in this context ? or is the code taking some default value since the 'ASCII' is not recognized ? And how many bytes does 'ASCII' in this scenario represents a character ?

但代码有效。在这种情况下,“ASCII”和“US-ASCII”是同义词吗?或者代码是否采用了一些默认值,因为无法识别“ASCII”?在这种情况下,“ASCII”代表一个字符有多少字节?

采纳答案by Mathias Begert

The documentation points out:

文档指出:

Every charset has a canonical name and may also have one or more aliases. The canonical name is returned by the name method of this class. Canonical names are, by convention, usually in upper case. The aliases of a charset are returned by the aliases method.

每个字符集都有一个规范名称,也可能有一个或多个别名。规范名称由此类的 name 方法返回。按照惯例,规范名称通常是大写的。字符集的别名由 aliases 方法返回。

Further, the javadoc of Charset.forName(String charsetName)tells you:

此外,javadoc ofCharset.forName(String charsetName)告诉你:

charsetName - The name of the requested charset; may be either a canonical name or an alias

charsetName - 请求的字符集的名称;可以是规范名称或别名

With this code you can find out more about the charsets:

使用此代码,您可以找到有关字符集的更多信息:

Charset ascii = Charset.forName("US-ASCII");
System.out.println(ascii.aliases());
// [ANSI_X3.4-1968, cp367, csASCII, iso-ir-6, ASCII, iso_646.irv:1983, ANSI_X3.4-1986, ascii7, default, ISO_646.irv:1991, ISO646-US, IBM367, 646, us]

System.out.println(ascii.newEncoder().maxBytesPerChar());
// 1.0

Charset utf8 = Charset.forName("UTF-8");
System.out.println(utf8.newEncoder().maxBytesPerChar());
// 3.0

回答by Peter Lawrey

ASCII is a alias for US-ASCII. It uses a 7-bit byte for each character.

ASCII 是 US-ASCII 的别名。它为每个字符使用 7 位字节。

Note: if you want compactness and simplicity, I suggest using ISO-8859-1. This also uses 1 byte per character but has a wider range. It supports \u0000to u00FFwhereas US-ASCII supports \u0000to \u007F

注意:如果你想要紧凑和简单,我建议使用 ISO-8859-1。这也使用每个字符 1 个字节,但范围更广。它支持\u0000u00FF,而US-ASCII支持\u0000\u007F

回答by Dakshinamurthy Karra

Running the following snippet, prints all charactersets that are available:

运行以下代码段,打印所有可用的字符集:

    SortedMap<String,Charset> availableCharsets = Charset.availableCharsets();
    Set<String> keySet = availableCharsets.keySet();
    for (String key : keySet) {
        System.out.println(key);
    }

I do not see ASCII in the list. Looking at the code for defaultCharset()in Charset class shows that if the file.encodingis an invalid one, it defaults to 'utf-8'.

我在列表中没有看到 ASCII。查看defaultCharset()Charset 类中的代码表明,如果file.encoding是无效的,则默认为“utf-8”。

Running the following snippet, prints 'UTF-8' as the default charset.

运行以下代码段,打印“UTF-8”作为默认字符集。

    System.setProperty("file.encoding", "ASCII");
    System.out.println(Charset.defaultCharset());