Java 中的术语“规范形式”或“规范表示”是什么意思?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/280107/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What does the term "canonical form" or "canonical representation" in Java mean?
提问by Shivasubramanian A
I have often heard this term being used, but I have never really understood it.
我经常听到这个词被使用,但我从来没有真正理解它。
What does it mean, and can anyone give some examples/point me to some links?
这是什么意思,谁能举一些例子/指向一些链接?
EDIT: Thanks to everyone for the replies. Can you also tell me how the canonical representation is useful in equals() performance, as stated in Effective Java?
编辑:感谢大家的答复。您还可以告诉我规范表示在 equals() 性能中有何用处,如 Effective Java 中所述?
采纳答案by Brian Gianforcaro
Wikipedia points to the term Canonicalization.
维基百科指向术语Canonicalization。
A process for converting data that has more than one possible representation into a "standard" canonical representation. This can be done to compare different representations for equivalence, to count the number of distinct data structures, to improve the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sorting order.
将具有多个可能表示的数据转换为“标准”规范表示的过程。这样做可以比较不同的等价表示,计算不同数据结构的数量,通过消除重复计算来提高各种算法的效率,或者可以强加有意义的排序顺序。
The Unicodeexample made the most sense to me:
在Unicode的例子才是最有意义的对我说:
Variable-length encodings in the Unicode standard, in particular UTF-8, have more than one possible encoding for most common characters. This makes string validation more complicated, since every possible encoding of each string character must be considered. A software implementation which does not consider all character encodings runs the risk of accepting strings considered invalid in the application design, which could cause bugs or allow attacks. The solution is to allow a single encoding for each character. Canonicalization is then the process of translating every string character to its single allowed encoding. An alternative is for software to determine whether a string is canonicalized, and then reject it if it is not. In this case, in a client/server context, the canonicalization would be the responsibility of the client.
Unicode 标准中的可变长度编码,尤其是 UTF-8,对于最常见的字符有不止一种可能的编码。这使得字符串验证更加复杂,因为必须考虑每个字符串字符的每种可能编码。不考虑所有字符编码的软件实现存在接受应用程序设计中被认为无效的字符串的风险,这可能导致错误或允许攻击。解决方案是允许对每个字符进行单一编码。规范化是将每个字符串字符转换为其单个允许的编码的过程。另一种方法是让软件确定字符串是否规范化,如果不是,则拒绝它。在这种情况下,在客户端/服务器上下文中,规范化将是客户端的责任。
In summary, a standard form of representation for data. From this form you can then convert to any representation you may need.
总之,数据的标准表示形式。然后,您可以从该表格转换为您可能需要的任何表示形式。
回答by Dónal
The word "canonical" is just a synonym for "standard" or "usual". It doesn`t have any Java-specific meaning.
“规范”一词只是“标准”或“通常”的同义词。它没有任何特定于 Java 的含义。
回答by Dov Wasserman
I believe there are two related uses of canonical: forms and instances.
我相信规范有两个相关的用途:形式和实例。
A canonical formmeans that values of a particular type of resource can be described or represented in multiple ways, and one of those ways is chosen as the favored canonical form. (That form is canonized, like books that made it into the bible, and the other forms are not.) A classic example of a canonical form is paths in a hierarchical file system, where a single file can be referenced in a number of ways:
甲规范形式的装置的特定类型的资源的值可以描述或以多种方式来表示,并且这些方式中的一种被选择作为有利规范形式。(即形式推崇,如书籍,使得它进入圣经,和其它形式都没有。)规范形式的一个典型的例子是在一个分级文件系统,其中,单个文件可以以多种方式来引用路径:
myFile.txt # in current working dir
../conf/myFile.txt # relative to the CWD
/apps/tomcat/conf/myFile.txt # absolute path using symbolic links
/u1/local/apps/tomcat-5.5.1/conf/myFile.txt # absolute path with no symlinks
The classic definition of the canonical representation of that file would be the last path. With local or relative paths you cannot globally identify the resource without contextual information. With absolute paths you can identify the resource, but cannot tell if two paths refer to the same entity. With two or more paths converted to their canonical forms, you can do all the above, plus determine if two resources are the same or not, if that is important to your application (solve the aliasing problem).
该文件的规范表示的经典定义将是最后一个路径。使用本地或相对路径,您无法在没有上下文信息的情况下全局识别资源。使用绝对路径,您可以识别资源,但无法判断两条路径是否指向同一个实体。将两个或多个路径转换为其规范形式后,您可以执行上述所有操作,并确定两个资源是否相同,如果这对您的应用程序很重要(解决别名问题)。
Note that the canonical form of a resource is not a quality of that particular form itself; there can be multiple possible canonical forms for a given type like file paths (say, lexicographically first of all possible absolute paths). One form is just selected as the canonical form for a particular application reason, or maybe arbitrarily so that everyone speaks the same language.
请注意,资源的规范形式不是该特定形式本身的质量;对于给定的类型,如文件路径,可以有多种可能的规范形式(例如,按字典顺序排列,首先是所有可能的绝对路径)。一种形式只是出于特定应用程序的原因被选为规范形式,或者可能是任意的,以便每个人都说同一种语言。
Forcing objects into their canonical instancesis the same basic idea, but instead of determining one "best" representation of a resource, it arbitrarily chooses one instance of a class of instances with the same "content" as the canonical reference, then converts all references to equivalent objects to use the one canonical instance.
强制对象进入它们的规范实例是相同的基本思想,但它不是确定资源的一个“最佳”表示,而是任意选择与规范引用具有相同“内容”的一类实例的一个实例,然后转换所有引用到等效对象以使用一个规范实例。
This can be used as a technique for optimizing both time and space. If there are multiple instances of equivalent objects in an application, then by forcing them all to be resolved as the single canonical instance of a particular value, you can eliminate all but one of each value, saving space and possibly time since you can now compare those values with reference identity (==) as opposed to object equivalence (equals()
method).
这可以用作优化时间和空间的技术。如果应用程序中有多个等效对象的实例,那么通过强制将它们全部解析为特定值的单个规范实例,您可以消除每个值中的除一个之外的所有实例,从而节省空间和可能的时间,因为您现在可以进行比较那些具有引用标识(==)而不是对象等价(equals()
方法)的值。
A classic example of optimizing performance with canonical instances is collapsing strings with the same content. Calling String.intern()
on two strings with the same character sequence is guaranteed to return the same canonical String object for that text. If you pass all your strings through that canonicalizer, you know equivalent strings are actually identical object references, i.e., aliases
使用规范实例优化性能的一个经典示例是折叠具有相同内容的字符串。调用String.intern()
具有相同字符序列的两个字符串可以保证为该文本返回相同的规范 String 对象。如果您通过该规范化器传递所有字符串,您就会知道等效字符串实际上是相同的对象引用,即别名
The enum types in Java 5.0+ force all instances of a particular enum value to use the same canonical instance within a VM, even if the value is serialized and deserialized. That is why you can use if (day == Days.SUNDAY)
with impunity in java if Days
is an enum type. Doing this for your own classes is certainly possible, but takes care. Read Effective Javaby Josh Bloch for details and advice.
Java 5.0+ 中的枚举类型强制特定枚举值的所有实例在 VM 中使用相同的规范实例,即使该值已序列化和反序列化。这就是为什么您可以if (day == Days.SUNDAY)
在 java 中不受惩罚地使用ifDays
是枚举类型的原因。为您自己的课程这样做当然是可能的,但要小心。阅读Josh Bloch 的Effective Java以获取详细信息和建议。
回答by Jaime
reduced to the simplest and most significant form without losing generality
在不失一般性的情况下简化为最简单和最重要的形式
回答by SASIKALA
canonical representation means view the character in different style for example if I write a letter A means another person may write the letter A in different style:)
规范表示意味着以不同的风格查看字符,例如,如果我写一个字母 A 意味着另一个人可能会以不同的风格写字母 A:)
This is according to OPTICAL CHARACTER RECOGNITION FIELD
这是根据光学字符识别领域
回答by Kimberley Coburn
Another good example might be: you have a class that supports the use of cartesian (x, y, z), spherical (r, theta, phi) and cylindrical coordinates (r, phi, z). For purposes of establishing equality (equals method), you would probably want to convert all representations to one "canonical" representation of your choosing, e.g. spherical coordinates. (Or maybe you would want to do this in general - i.e. use one internal representation.) I am not an expert, but this did occur to me as maybe a good concrete example.
另一个很好的例子可能是:您有一个支持使用笛卡尔坐标 (x, y, z)、球坐标 (r, theta, phi) 和圆柱坐标 (r, phi, z) 的类。为了建立相等性(equals 方法),您可能希望将所有表示转换为您选择的一种“规范”表示,例如球坐标。(或者,您可能想在一般情况下这样做 - 即使用一种内部表示。)我不是专家,但我确实认为这可能是一个很好的具体例子。
回答by Chris Mawata
An easy way to remember it is the way "canonical" is used in theological circles, canonical truth is the real truth so if two people find it they have found the same truth. Same with canonical instance. If you think you have found two of them (i.e. a.equals(b)
) you really only have one (i.e. a == b
). So equality implies identity in the case of canonical object.
记住它的一种简单方法是在神学界使用“规范”的方式,规范真理是真正的真理,因此如果两个人找到它,他们就会找到相同的真理。与规范实例相同。如果您认为您找到了其中的两个(即a.equals(b)
),那么您实际上只有一个(即a == b
)。因此,在规范对象的情况下,平等意味着身份。
Now for the comparison. You now have the choice of using a==b
ora.equals(b)
, since they will produce the same answer in the case of canonical instance but a==b is comparison of the reference (the JVM can compare two numbers extremely rapidly as they are just two 32 bit patterns compared to a.equals(b)
which is a method call and involves more overhead.
现在进行比较。您现在可以选择使用a==b
ora.equals(b)
,因为它们在规范实例的情况下会产生相同的答案,但 a==b 是引用的比较(JVM 可以非常快速地比较两个数字,因为它们只是两个 32 位模式比较到a.equals(b)
这是一个方法调用,并涉及更多的开销。
回答by Michael Marton
A good example for understanding "canonical form/representation" is to look at the XML schema datatype definition of "boolean":
理解“规范形式/表示”的一个很好的例子是查看“boolean”的 XML 模式数据类型定义:
- the "lexical representation" of boolean can be one of:
{true, false, 1, 0}
whereas - the "canonical representation" can only be one of
{true, false}
- boolean 的“词法表示”可以是以下之一:
{true, false, 1, 0}
而 - “规范表示”只能是其中之一
{true, false}
This, in essence, means that
这在本质上意味着
"true"
and"1"
get mapped to the canonical repr."true"
and"false"
and"0"
get mapped to the canoncial repr."false"
"true"
并"1"
映射到规范代表。"true"
和"false"
并"0"
映射到规范代表。"false"
回答by Maksym Ovsianikov
A canonical form means a naturally unique representation of the element
规范形式意味着元素的自然独特表示
回答by The Gilbert Arenas Dagger
The OP's questions about canonical formand how it can improve performance of the equals
method can both be answered by extending the example provided in Effective Java.
OP 关于规范形式以及它如何提高equals
方法性能的问题都可以通过扩展 Effective Java 中提供的示例来回答。
Consider the following class:
考虑以下类:
public final class CaseInsensitiveString {
private final String s;
public CaseInsensitiveString(String s) {
this.s = Objects.requireNonNull(s);
}
@Override
public boolean equals(Object o) {
return o instanceof CaseInsensitiveString && ((CaseInsensitiveString) o).s.equalsIgnoreCase(s);
}
}
The equals
method in this example has added cost by using String
's equalsIgnoreCase
method. As mentioned in the text
所述equals
在该实施例中方法,通过使用增加的成本String
的equalsIgnoreCase
方法。正如文中提到的
you may want to store a canonical form of the field so the equals method can do a cheap exact comparison on canonical forms rather than a more costly nonstandard comparison.
您可能希望存储该字段的规范形式,以便 equals 方法可以对规范形式进行廉价的精确比较,而不是成本更高的非标准比较。
What does Joshua Bloch mean when he says canonical form? Well, I think Dónal's concise answeris very appropriate. We can store the underlying String
field in the CaseInsensitiveString
example in a standardway, perhaps the uppercase form of the String
. Now, you can reference this canonical formof the CaseInsensitiveString
, its uppercase variant, and perform cheap evaluations in your equals
and hashcode
methods.
Joshua Bloch 所说的规范形式是什么意思?嗯,我认为 Dónal 的简洁回答非常合适。我们可以以标准方式存储示例中的底层String
字段,可能是. 现在,你可以参照这个标准形式的,其大写变种,并在您进行廉价的评估和方法。CaseInsensitiveString
String
CaseInsensitiveString
equals
hashcode