Java Charset.forName("ASCII") 或 Charset.forName("US-ASCII")
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32063929/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java Charset.forName("ASCII") or Charset.forName("US-ASCII")
提问by
I was going through a code and came across the following line.
我正在浏览代码并遇到以下行。
Charset.forName("ASCII")
But when I looked at the java documentationit only has
但是当我查看 java文档时,它只有
US-ASCII ISO-8859-1 UTF-8 UTF-16BE UTF-16LE UTF-16
But the code works. Are 'ASCII' and 'US-ASCII' are synonyms in this context ? or is the code taking some default value since the 'ASCII' is not recognized ? And how many bytes does 'ASCII' in this scenario represents a character ?
但代码有效。在这种情况下,“ASCII”和“US-ASCII”是同义词吗?或者代码是否采用了一些默认值,因为无法识别“ASCII”?在这种情况下,“ASCII”代表一个字符有多少字节?
采纳答案by Mathias Begert
The documentation points out:
文档指出:
Every charset has a canonical name and may also have one or more aliases. The canonical name is returned by the name method of this class. Canonical names are, by convention, usually in upper case. The aliases of a charset are returned by the aliases method.
每个字符集都有一个规范名称,也可能有一个或多个别名。规范名称由此类的 name 方法返回。按照惯例,规范名称通常是大写的。字符集的别名由 aliases 方法返回。
Further, the javadoc of Charset.forName(String charsetName)
tells you:
此外,javadoc ofCharset.forName(String charsetName)
告诉你:
charsetName - The name of the requested charset; may be either a canonical name or an alias
charsetName - 请求的字符集的名称;可以是规范名称或别名
With this code you can find out more about the charsets:
使用此代码,您可以找到有关字符集的更多信息:
Charset ascii = Charset.forName("US-ASCII");
System.out.println(ascii.aliases());
// [ANSI_X3.4-1968, cp367, csASCII, iso-ir-6, ASCII, iso_646.irv:1983, ANSI_X3.4-1986, ascii7, default, ISO_646.irv:1991, ISO646-US, IBM367, 646, us]
System.out.println(ascii.newEncoder().maxBytesPerChar());
// 1.0
Charset utf8 = Charset.forName("UTF-8");
System.out.println(utf8.newEncoder().maxBytesPerChar());
// 3.0
回答by Peter Lawrey
ASCII is a alias for US-ASCII. It uses a 7-bit byte for each character.
ASCII 是 US-ASCII 的别名。它为每个字符使用 7 位字节。
Note: if you want compactness and simplicity, I suggest using ISO-8859-1. This also uses 1 byte per character but has a wider range. It supports \u0000
to u00FF
whereas US-ASCII supports \u0000
to \u007F
注意:如果你想要紧凑和简单,我建议使用 ISO-8859-1。这也使用每个字符 1 个字节,但范围更广。它支持\u0000
到u00FF
,而US-ASCII支持\u0000
到\u007F
回答by Dakshinamurthy Karra
Running the following snippet, prints all charactersets that are available:
运行以下代码段,打印所有可用的字符集:
SortedMap<String,Charset> availableCharsets = Charset.availableCharsets();
Set<String> keySet = availableCharsets.keySet();
for (String key : keySet) {
System.out.println(key);
}
I do not see ASCII in the list. Looking at the code for defaultCharset()
in Charset class shows that if the file.encoding
is an invalid one, it defaults to 'utf-8'.
我在列表中没有看到 ASCII。查看defaultCharset()
Charset 类中的代码表明,如果file.encoding
是无效的,则默认为“utf-8”。
Running the following snippet, prints 'UTF-8' as the default charset.
运行以下代码段,打印“UTF-8”作为默认字符集。
System.setProperty("file.encoding", "ASCII");
System.out.println(Charset.defaultCharset());