Java 为什么字符集名称不是常量?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1684040/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why charset names are not constants?
提问by serg
Charset issues are confusing and complicated by themselves, but on top of that you have to remember exact names of your charsets. Is it "utf8"
? Or "utf-8"
? Or maybe "UTF-8"
? When searching internet for code samples you will see all of the above. Why not just make them named constants and use Charset.UTF8
?
字符集问题本身就令人困惑和复杂,但最重要的是您必须记住字符集的确切名称。是"utf8"
吗?或者"utf-8"
?或者也许"UTF-8"
?在 Internet 上搜索代码示例时,您将看到以上所有内容。为什么不让它们命名为常量并使用Charset.UTF8
?
采纳答案by Kevin Bourrillion
The simple answer to the question asked is that the available charset strings vary from platform to platform.
对所问问题的简单回答是可用的字符集字符串因平台而异。
However, there are six that are required to be present, so constants could have been made for those long ago. I don't know why they weren't.
然而,有六个是必须存在的,所以很早以前就可以为那些常量创建常量了。我不知道为什么他们不是。
JDK 1.4 did a great thing by introducing the Charset type. At this point, they wouldn't have wanted to provide String constants anymore, since the goal is to get everyone using Charset instances. So why not provide the six standard Charset constants, then? I asked Martin Buchholz since he happens to be sitting right next to me, and he said there wasn't a really particularly great reason, except that at the time, things were still half-baked -- too few JDK APIs had been retrofitted to accept Charset, and of the ones that were, the Charset overloads usually performed slightly worse.
JDK 1.4 通过引入 Charset 类型做了一件了不起的事情。在这一点上,他们不想再提供 String 常量,因为目标是让每个人都使用 Charset 实例。那么为什么不提供六个标准的 Charset 常量呢?我问了 Martin Buchholz,因为他正好坐在我旁边,他说没有什么特别好的理由,只是当时事情还没有完全成熟——改进的 JDK API 太少了接受字符集,在那些字符集重载中,字符集重载的性能通常稍差一些。
It's sad that it's only in JDK 1.6 that they finally finished outfitting everything with Charset overloads. And that this backwards performance situation still exists (the reason why is incredibly weird and I can't explain it, but is related to security!).
令人遗憾的是,只有在 JDK 1.6 中,他们才最终用 Charset 重载完成了一切。并且这种落后的性能情况仍然存在(原因非常奇怪,我无法解释,但与安全有关!)。
Long story short -- just define your own constants, or use Guava's Charsets class which Tony the Pony linked to (though that library is not really actually released yet).
长话短说——只需定义你自己的常量,或者使用与 Tony the Pony 链接的 Guava 的 Charsets 类(尽管该库实际上还没有真正发布)。
Update:a StandardCharsets
class is in JDK 7.
更新:一个StandardCharsets
类在 JDK 7 中。
回答by Jon Skeet
I'd argue that we can do much better than that... why aren't the guaranteed-to-be-available charsets accessible directly? Charset.UTF8
should be a reference to the Charset
, not the name as a string. That way we wouldn't have to handle UnsupportedEncodingException
all over the place.
我认为我们可以做得更好……为什么不能直接访问保证可用的字符集?Charset.UTF8
应该是对 的引用Charset
,而不是作为字符串的名称。这样我们就不必UnsupportedEncodingException
到处处理了。
Mind you, I also think that .NET chose a better strategy by defaulting to UTF-8 everywhere. It then screwed up by naming the "operating system default" encoding property simply Encoding.Default
- which isn'tthe default within .NET itself :(
请注意,我还认为 .NET 选择了更好的策略,因为它在任何地方都默认使用 UTF-8。然后它通过简单地命名“操作系统默认”编码属性而搞砸了Encoding.Default
- 这不是.NET 本身的默认值:(
Back to ranting about Java's charset support - why isn't there a constructor for FileWriter
/FileReader
which takes a Charset
? Basically those are almost useless classes due to that restriction - you almost always need an InputStreamReader
around a FileInputStream
or the equivalent for output :(
回到关于 Java 的字符集支持的咆哮——为什么没有一个FileWriter
/的构造函数,FileReader
它需要一个Charset
?基本上,由于这种限制,这些几乎无用的类-您几乎总是需要 aInputStreamReader
左右FileInputStream
或等效的输出 :(
Nurse, nurse - where's my medicine?
护士,护士——我的药呢?
EDIT: It occurs to me that this hasn't really answered the question. The real answer is presumably either "nobody involved thought of it" or "somebody involved thought it was a bad idea." I would strongly suggest that in-house utility classes providing the names or charsets avoid duplication around the codebase... Or you could just use the one that we used at Google when this answer was first written. (Note that as of Java 7, you'd just use StandardCharsets
instead.)
编辑:我觉得这并没有真正回答这个问题。真正的答案大概是“没有人想到它”或“有人认为这是一个坏主意”。我强烈建议提供名称或字符集的内部实用程序类避免在代码库周围重复......或者您可以使用我们在第一次编写此答案时在 Google使用的那个。(请注意,从 Java 7 开始,您只需StandardCharsets
改用即可。)
回答by McDowell
The current state of the encoding API leaves something to be desired. Some parts of the Java 6 API don't accept Charset
in place of a string (in logging
, dom.ls
, PrintStream
; there may be others). It doesn't help that encodings are supposed to have different canonical names for different parts of the standard library.
编码 API 的当前状态还有待改进。Java 6 API 的某些部分不接受Charset
代替字符串(在logging
、dom.ls
、 中PrintStream
;可能还有其他部分)。对于标准库的不同部分,编码应该具有不同的规范名称,这无济于事。
I can understand how things got to where they are; not sure I have any brilliant ideas about how to fix them.
我能理解事情是如何走到现在的;不确定我对如何修复它们有任何绝妙的想法。
As an aside...
作为旁白...
You can look up the names for Sun's Java 6 implementation here.
您可以在此处查找 Sun 的 Java 6 实现的名称。
For UTF-8, the canonical values are "UTF-8"
for java.nio
and "UTF8"
for java.lang
and java.io
. The only encodings the spec requires a JRE to support are: US-ASCII; ISO-8859-1; UTF-8; UTF-16BE; UTF-16LE; UTF-16.
对于 UTF-8,规范值为"UTF-8"
forjava.nio
和"UTF8"
forjava.lang
和java.io
。规范要求 JRE 支持的唯一编码是:US-ASCII;ISO-8859-1;UTF-8;UTF-16BE;UTF-16LE;UTF-16。
回答by Alexander Pogrebnyak
I have long ago defined a utility class with UTF_8, ISO_8859_1 and US_ASCII Charset constants.
我很久以前就定义了一个带有 UTF_8、ISO_8859_1 和 US_ASCII Charset 常量的实用程序类。
Also, some long time ago ( 2+ years ) I did a simple performance test between new String( byte[], Charset )
and new String( byte[], String charset_name )
and discovered that the latter implementation is CONSIDERABLYfaster. If you take a look under the hood at the source code you will see that they indeed follow quite a different path.
此外,一些很久以前(2年以上),我做之间的简单性能测试new String( byte[], Charset )
和new String( byte[], String charset_name )
,发现后者的实现是相当快。如果您深入了解源代码,您会发现它们确实遵循了完全不同的路径。
For that reason I included a utility in the same class
出于这个原因,我在同一个类中包含了一个实用程序
public static String stringFromByteArray (
final byte[] array,
final Charset charset
)
{
try
{
return new String( array, charset.name( ) )
}
catch ( UnsupportedEncodingException ex )
{
// cannot happen
}
}
Why the String( byte[], Charset ) constructor does not do the same, beats me.
为什么 String( byte[], Charset ) 构造函数不做同样的事情,打败我。
回答by Etienne Neveu
Two years later, and Java 7's StandardCharsetsnow defines constants for the 6 standard charsets.
两年后,Java 7 的StandardCharsets现在定义了 6 个标准字符集的常量。
If you are stuck on Java 5/6, you can use Guava's Charsetsconstants, as suggested by Kevin Bourrillion and Jon Skeet.
如果您坚持使用 Java 5/6,则可以按照 Kevin Bourrillion 和 Jon Skeet 的建议使用 Guava 的Charsets常量。
回答by Roger
In Java 1.7
在 Java 1.7 中
import java.nio.charset.StandardCharsets
import java.nio.charset.StandardCharsets
ex:
StandardCharsets.UTF_8
StandardCharsets.US_ASCII
前任:
StandardCharsets.UTF_8
StandardCharsets.US_ASCII