Java 使用 hashCode() 测试字符串相等性

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1465621/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 12:34:41  来源:igfitidea点击:

Testing string equality using hashCode()

javastringhashcode

提问by

Is there any reason why a Java string cannot be tested for equality using it's hashCode method? So basically, instead of....

有什么理由不能使用它的 hashCode 方法测试 Java 字符串的相等性吗?所以基本上,而不是......

"hello".equals("hello")

You could use...

你可以用...

"hello".hashCode() == "hello".hashCode()

This would be useful because once a string has calculated it's hashcode then comparing a string would be as efficient as comparing an int as the string caches the hashcode and it is quite likely that the string is in the string pool anyway, if you designed it that way.

这将很有用,因为一旦字符串计算出它的哈希码,那么比较字符串将与比较 int 一样有效,因为字符串缓存了哈希码,并且很可能该字符串无论如何都在字符串池中,如果您设计它道路。

采纳答案by dstibbe

because: hashCodes of two objects must be equal if the objects are equal, however, if two objects are unequal, the hashCode can still be equal.

因为:如果两个对象的hashCode 相等,则它们必须相等,但是,如果两个对象不相等,hashCode 仍然可以相等。

(modified after comment)

(评论后修改)

回答by Jim Rush

The hashCode value isn't unique, which means the Strings may not actually match. To improve performance, often implementations of equals will perform a hashCode check before performing more laborious checks.

hashCode 值不是唯一的,这意味着字符串实际上可能不匹配。为了提高性能,equals 的实现通常会在执行更费力的检查之前执行 hashCode 检查。

回答by Will

There is no reason not to use hashCode as you describe.

没有理由不按照您的描述使用 hashCode。

However, you must be aware of collisions. There is a chance - a small chance admittedly - that two different strings do hash to the same value. Consider doing a hashCode at first, and if equal also do the full comparison using the equals().

但是,您必须注意碰撞。有可能 - 诚然,这是一个很小的机会 - 两个不同的字符串确实会散列到相同的值。首先考虑做一个 hashCode,如果相等也使用 equals() 做完整的比较。

回答by Wim ten Brink

Very simple reason: risk of collisions... A hash code will have a lot less possible values than a string. It depends a bit of the kind of hash you generate but let's take a very simple example, where you would add the ordinal values of letters, multiplied with it's position: a=1, b=2, etc. Thus, 'hello' would translate to: h: 8x1=8, e: 5x2=10, l: 12x3=36, l: 12x4=48, o: 15x5=75. 8+10+36+48+75=177.

很简单的原因:冲突的风险......哈希码的可能值比字符串少得多。这取决于您生成的哈希类型,但让我们举一个非常简单的例子,您将添加字母的序数值,乘以它的位置:a=1、b=2 等。因此,'hello' 会转换为:h:8x1=8,e:5x2=10,l:12x3=36,l:12x4=48,o:15x5=75。8+10+36+48+75=177。

Are there other string values that could end as 177 hashed? Of course! Plenty of options. Feel free to calculate a few.

是否有其他字符串值可以以 177 散列结尾?当然!很多选择。随意计算一些。

Still, this hashing method used a simple method. Java and .NET use a more complex hashing algorithm with a lot smaller chance of such collisions. But still, there's a chance that two different strings will result in the same hash value, thus this method is less reliable.

不过,这种散列方法使用了一种简单的方法。Java 和 .NET 使用更复杂的散列算法,此类冲突的可能性要小得多。但是,仍然有可能两个不同的字符串会导致相同的哈希值,因此这种方法不太可靠。

回答by ZZ Coder

Let me give you a counter example. Try this,

我给你举个反例。尝试这个,

public static void main(String[] args) {
    String str1 = "0-42L";
    String str2 = "0-43-";

    System.out.println("String equality: " + str1.equals(str2));
    System.out.println("HashCode eqauality: " + (str1.hashCode() == str2.hashCode()));
}

The result on my Java,

在我的 Java 上的结果,

String equality: false
HashCode eqauality: true

回答by finnw

You can get the effect you want using String.intern()(which is implemented using a hash table.)

你可以得到你想要的效果String.intern()(这是使用哈希表实现的。)

You can compare the return values of intern()using the ==operator. If they refer to the same string then the original strings were equivalent (i.e. equals()would have returned true), and it requires only a pointer comparison (which has the same cost as an intcomparison.)

您可以比较intern()使用==运算符的返回值。如果它们引用相同的字符串,则原始字符串是等效的(即equals()会返回true),并且只需要指针比较(与比较具有相同的成本int。)

String a = "Hello";
String b = "Hel" + "lo";

System.out.println(a.equals(b));
System.out.println(a == b);

String a2 = a.intern();
String b2 = b.intern();

System.out.println(a2.equals(b2));
System.out.println(a2 == b2);

Output:

输出:

true
false
true
true

回答by Omry Yadan

as many said hashCode does not guaranty uniqueness. in fact, it cannot do that for a very simple reason.

正如许多人所说,hashCode 并不能保证唯一性。事实上,它不能这样做,原因很简单。

hashCode returns an int, which means there are 2^32 possible values (around 4,000,000,000), but there are surely more than 2^32 possible strings, which means at least two strings have the same hashcode value.

hashCode 返回一个 int,这意味着有 2^32 个可能的值(大约 4,000,000,000),但肯定有超过 2^32 个可能的字符串,这意味着至少有两个字符串具有相同的 hashcode 值。

this is called Pigeonhole principle.

这就是所谓的鸽巢原理

回答by Jay

Others have pointed out why it won't work. So I'll just add the addendum that the gain would be minimal anyway.

其他人已经指出为什么它不起作用。因此,我将补充一点,即无论如何增益将是最小的。

When you compare two strings in Java, the String equals function first checks if they are two references to the same object. If so, it immediately returns true. Then it checks if the lengths are equal. If not, it returns false. Only then does it start comparing character-by-character.

在 Java 中比较两个字符串时,String equals 函数首先检查它们是否是对同一对象的两个引用。如果是这样,它会立即返回 true。然后它检查长度是否相等。如果不是,则返回 false。只有这样它才会开始逐个字符地比较。

If you're manipulating data in memory, the same-object compare may quickly handle the "same" case, and that's a quick, umm, 4-byte integer compare I think. (Someone correct me if I have the length of an object handle wrong.)

如果您在内存中操作数据,则相同对象比较可能会快速处理“相同”情况,我认为这是一个快速的 4 字节整数比较。(如果我把对象句柄的长度弄错了,有人会纠正我。)

For most unequal strings, I'd bet the length compare quickly finds them not equal. If you're comparing two names of things -- customers, cities, products, whatever -- they'll usually have unequal length. So a simple int compare quickly disposes of them.

对于大多数不相等的字符串,我敢打赌长度比较很快就会发现它们不相等。如果您要比较事物的两个名称——客户、城市、产品等等——它们的长度通常不相等。所以一个简单的 int compare 可以快速处理它们。

The worst case for performance is going to be two long, identical, but not the same object strings. Then it has to do the object handle compare, false, keep checking. The length compare, true, keep checking. Then character by character through the entire length of the string to verify that yes indeed they are equal all the way to the end.

性能的最坏情况将是两个长的、相同但不相同的对象字符串。然后它必须做对象句柄比较,false,继续检查。长度比较,真,继续检查。然后通过字符串的整个长度逐个字符地验证是的,它​​们确实一直到最后都相等。

回答by Shegufa Taranjum

Two different String can easily generate same hash Code or different hash Code. If u want a equality test hash Code won't give an unique result. When we use String class it will return different value of hash Code. So String buffer class should be apply to have same hash Code for every concated object.

两个不同的 String 可以轻松生成相同的 hash Code 或不同的 hash Code。如果你想要一个相等测试哈希代码不会给出唯一的结果。当我们使用 String 类时,它会返回不同的哈希码值。因此,字符串缓冲区类应该适用于每个连接对象具有相同的哈希代码。