java 数字比较比字符串比较快吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16503735/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Is number comparison faster than string comparison?
提问by FirstName LastName
I heard that hashing (ie converting a string or object to a number) is used for strings and such because it is easier to compare numbers than strings. If true, what is the reason for this ?
我听说散列(即将字符串或对象转换为数字)用于字符串等,因为比较数字比比较字符串更容易。如果是真的,这是什么原因?
回答by Barney Govan
This is not necessarily the case, but probably the case most of the time.
不一定是这种情况,但大多数时候可能是这种情况。
Consider the following situation:
考虑以下情况:
I want to compare the string "apples" vs. "oranges". If I only want to determine "apples" == "oranges", I need only compare the first character of each string: 'a' != 'o' => "apples" != "oranges". If I hash the string and then do the comparison, it is significantly slower as I have to parse both strings and feed them into a hashing algorithm before comparing the resultant integers.
我想比较字符串“apples”和“oranges”。如果我只想确定“apples”==“oranges”,我只需要比较每个字符串的第一个字符:'a' != 'o' => "apples" != "oranges"。如果我对字符串进行散列然后进行比较,它会明显变慢,因为我必须在比较结果整数之前解析两个字符串并将它们输入散列算法。
If, however, I need to do this comparison many times, and perhaps I'm comparing "oranges" to "orangutans" a lot, then if I hash all the strings once and do the comparisons of integers many times, it will work out faster. This is the principle that a hash map is based on.
但是,如果我需要多次进行这种比较,并且可能我经常将“橙色”与“猩猩”进行比较,那么如果我将所有字符串散列一次并多次进行整数比较,它就会解决快点。这是哈希映射所基于的原理。
Note, however, that hashing a string is useful for direct equals comparisons, it cannot determine if strings are lexigraphically greater or less than each other, and so ordering strings is not possible via the hashing method. (This is why HashMap in Java is unordered).
但是请注意,散列字符串对于直接等于比较很有用,它无法确定字符串在字典序上是大于还是小于彼此,因此无法通过散列方法对字符串进行排序。(这就是为什么 Java 中的 HashMap 是无序的)。
回答by goblinjuice
Comparing two numbers is magnitudes faster than comparing two strings (representing the same numbers). Comparing two numbers simply require comparing individual bits and could be done super fast using any of AND, XOR, 2's complements, etc.
比较两个数字比比较两个字符串(代表相同的数字)要快得多。比较两个数字只需要比较各个位,并且可以使用 AND、XOR、2 的补码等中的任何一个来快速完成。
Comparing two strings is very slow and expensive. Most algorithms require iterating through entire string and matching each character.
比较两个字符串非常缓慢且昂贵。大多数算法需要遍历整个字符串并匹配每个字符。
For example let's say we want to compare 9 with 12 (false). For numeric comparison, let's assume the algorithm compares individual bit. 9 = 1001 12 = 1100
例如,假设我们想比较 9 和 12(假)。对于数值比较,让我们假设算法比较单个位。9 = 1001 12 = 1100
Here, the worst case algorithm will compare 4 bits.
在这里,最坏情况算法将比较4 位。
Now if we represent "9" and "12" as strings, they will be stored in memory as 16 bits each (Recall: Java uses UTF-16 to represent strings in memory) and have to be passed to a String comparison algorithm. In fact, Java's actual String comparison function is below:
现在,如果我们将“9”和“12”表示为字符串,它们将分别以 16 位的形式存储在内存中(回想一下:Java 使用 UTF-16 来表示内存中的字符串)并且必须传递给字符串比较算法。实际上,Java 的实际 String 比较函数如下:
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = count;
if (n == anotherString.count) {
char v1[] = value;
char v2[] = anotherString.value;
int i = offset;
int j = anotherString.offset;
while (n-- != 0) {
if (v1[i++] != v2[j++])
return false;
}
return true;
}
}
return false;
}
As you can see, there's a lot more going around for String comparison.
如您所见,字符串比较还有很多内容。
回答by Bruce Martin
In general, most Computers have a single instruction to compare integers, longs etc and will take at most a couple of instruction cycles. Strings are normally compared by a utility function/method (there may be the odd exception to this rule).
一般来说,大多数计算机只有一条指令来比较整数、长整数等,最多需要几个指令周期。字符串通常由实用程序函数/方法进行比较(此规则可能有奇怪的例外)。
in Java for example a String is basically represented as
例如,在 Java 中,字符串基本上表示为
/** The value is used for character storage. */
private final char value[];
/** The offset is the first index of the storage that is used. */
private final int offset;
/** The count is the number of characters in the String. */
private final int count;
And the equals method is
而equals方法是
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = count;
if (n == anotherString.count) {
char v1[] = value;
char v2[] = anotherString.value;
int i = offset;
int j = anotherString.offset;
while (n-- != 0) {
if (v1[i++] != v2[j++])
return false;
}
return true;
}
}
return false;
The equals Method does both this == anObjectand n == anotherString.count, both essentially integer compares, even before it starts comparing characters. It is going take a lot longer than a single instruction that a integer compare takes
equals 方法同时执行this == anObject和n == anotherString.count,两者本质上都是整数比较,甚至在开始比较字符之前也是如此。它比整数比较需要的单条指令需要更长的时间
The C string compareis simpler / fasterthan the Java equivalent but it will contain some sort of loop and multiple instructions for each pass through the loop.
在比较C字符串是简单/快比Java等价的,但它包含了某种循环和多条指令的每一次通过循环。
This will take longer than the one instruction that a Integer Compare requires
这将比整数比较所需的一条指令花费的时间更长
回答by Evgeniy Dorofeev
Comparing primitive numbers is definitely faster than comparing strings because it's just one computer instruction while comparing strings in Java is a method. But hashing in Java is used for a different reason, Object.hashCode() is used in hash tables for quick search in collections.
比较原始数字绝对比比较字符串快,因为它只是一条计算机指令,而在 Java 中比较字符串是一种方法。但是 Java 中的散列用于不同的原因,Object.hashCode() 用于散列表中以在集合中快速搜索。
回答by SLaks
Yes, but that has nothing to do with hashing.
是的,但这与散列无关。
Comparing numbers involves simple hardware instructions that compare bits.
比较数字涉及比较位的简单硬件指令。
Comparing strings involves (a) iterating through both strings until you find differing characters (unlike numbers, which are fixed-size) and (b) lots of Unicode magic (different strings of different lengths can actually be equal, and different characters in different code blocks compare differently).
比较字符串涉及 (a) 遍历两个字符串,直到找到不同的字符(与固定大小的数字不同)和 (b) 大量 Unicode 魔法(不同长度的不同字符串实际上可以相等,并且不同代码中的不同字符块比较不同)。
Hashing is typically used to convert a string into an array index.
散列通常用于将字符串转换为数组索引。