Java Charset.forName("ASCII") 或 Charset.forName("US-ASCII")

Question

提问by

I was going through a code and came across the following line.

我正在浏览代码并遇到以下行。

Charset.forName("ASCII")

But when I looked at the java documentationit only has

但是当我查看 java文档时，它只有

US-ASCII    ISO-8859-1    UTF-8   UTF-16BE   UTF-16LE   UTF-16

But the code works. Are 'ASCII' and 'US-ASCII' are synonyms in this context ? or is the code taking some default value since the 'ASCII' is not recognized ? And how many bytes does 'ASCII' in this scenario represents a character ?

但代码有效。在这种情况下，“ASCII”和“US-ASCII”是同义词吗？或者代码是否采用了一些默认值，因为无法识别“ASCII”？在这种情况下，“ASCII”代表一个字符有多少字节？

Answer 1

采纳答案by Mathias Begert

The documentation points out:

文档指出：

Every charset has a canonical name and may also have one or more aliases. The canonical name is returned by the name method of this class. Canonical names are, by convention, usually in upper case. The aliases of a charset are returned by the aliases method.

每个字符集都有一个规范名称，也可能有一个或多个别名。规范名称由此类的 name 方法返回。按照惯例，规范名称通常是大写的。字符集的别名由 aliases 方法返回。

Further, the javadoc of Charset.forName(String charsetName)tells you:

此外，javadoc ofCharset.forName(String charsetName)告诉你：

charsetName - The name of the requested charset; may be either a canonical name or an alias

charsetName - 请求的字符集的名称；可以是规范名称或别名

With this code you can find out more about the charsets:

使用此代码，您可以找到有关字符集的更多信息：

Charset ascii = Charset.forName("US-ASCII");
System.out.println(ascii.aliases());
// [ANSI_X3.4-1968, cp367, csASCII, iso-ir-6, ASCII, iso_646.irv:1983, ANSI_X3.4-1986, ascii7, default, ISO_646.irv:1991, ISO646-US, IBM367, 646, us]

System.out.println(ascii.newEncoder().maxBytesPerChar());
// 1.0

Charset utf8 = Charset.forName("UTF-8");
System.out.println(utf8.newEncoder().maxBytesPerChar());
// 3.0

Answer 2

回答by Peter Lawrey

ASCII is a alias for US-ASCII. It uses a 7-bit byte for each character.

ASCII 是 US-ASCII 的别名。它为每个字符使用 7 位字节。

Note: if you want compactness and simplicity, I suggest using ISO-8859-1. This also uses 1 byte per character but has a wider range. It supports \u0000to u00FFwhereas US-ASCII supports \u0000to \u007F

注意：如果你想要紧凑和简单，我建议使用 ISO-8859-1。这也使用每个字符 1 个字节，但范围更广。它支持\u0000到u00FF，而US-ASCII支持\u0000到\u007F

Answer 3

回答by Dakshinamurthy Karra

Running the following snippet, prints all charactersets that are available:

运行以下代码段，打印所有可用的字符集：

    SortedMap<String,Charset> availableCharsets = Charset.availableCharsets();
    Set<String> keySet = availableCharsets.keySet();
    for (String key : keySet) {
        System.out.println(key);
    }

I do not see ASCII in the list. Looking at the code for defaultCharset()in Charset class shows that if the file.encodingis an invalid one, it defaults to 'utf-8'.

我在列表中没有看到 ASCII。查看defaultCharset()Charset 类中的代码表明，如果file.encoding是无效的，则默认为“utf-8”。

Running the following snippet, prints 'UTF-8' as the default charset.

运行以下代码段，打印“UTF-8”作为默认字符集。

    System.setProperty("file.encoding", "ASCII");
    System.out.println(Charset.defaultCharset());

Java Charset.forName("ASCII") 或 Charset.forName("US-ASCII")

提问by

采纳答案by Mathias Begert

回答by Peter Lawrey

回答by Dakshinamurthy Karra

相关推荐

最近更新

标签

Java Charset.forName("ASCII") 或 Charset.forName("US-ASCII")

提问by

采纳答案by Mathias Begert

回答by Peter Lawrey

回答by Dakshinamurthy Karra

相关推荐

如何在java中将单选按钮的值插入数据库？

Java 在循环中重用 StringBuilder 会更好吗？

Java 枚举和 android 注释 intDef

Java 如何单元测试抽象类：用存根扩展？

相关推荐

最近更新

标签