java 什么是按字典序排列的字符串?爪哇
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32614158/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What is string lexicographically? Java
提问by JonathanScialpi
The compareTo() method in Java compares two strings "lexicographically". Can someone please simply explain how the lexicographic comparison works in java?
Java 中的 compareTo() 方法“按字典顺序”比较两个字符串。有人可以简单地解释一下字典比较在java中是如何工作的吗?
I found this postthat explains the three cases of <0 , ==0, and >0 ; However, I am still confused...
我发现这篇文章解释了 <0 、 ==0 和 >0 的三种情况;然而,我还是一头雾水……
Does this mean that the int returned is the number of places away the strings are form one another if they were to be sorted alphabetically like a dictionary?
这是否意味着如果字符串像字典一样按字母顺序排序,则返回的 int 是字符串彼此形成的位置数?
Also, how does the method deal with case sensitivity? Are lower case letters first in line before uppercase? Is there a chart for this?
另外,该方法如何处理区分大小写的问题?小写字母在大写之前排在第一位吗?有图表吗?
For example, the below code produces an output of -31. Does this mean that the string Dog is -31 places away from the string cat?
例如,下面的代码产生 -31 的输出。这是否意味着字符串 Dog 距离字符串 cat -31 个位置?
public static void main(String[] args) {
Scanner keyboard = new Scanner(System.in);
String str1 = "Dog";
String str2 = "cat";
int result = str1.compareTo(str2);
System.out.println(result);
采纳答案by Jean-Fran?ois Savard
The value returned does not really matter as the compareTo
contract is to return negative, positive or 0 (as you already know).
返回的值并不重要,因为compareTo
合约是返回负数、正数或 0(如您所知)。
However, if really you want to understand why -31
is returned when comparing Dog
with cat
(or any other string) then you could simply look at the method directly in String
class :
但是,如果您真的想了解-31
在Dog
与cat
(或任何其他字符串)进行比较时为什么会返回,那么您可以直接在String
class 中查看该方法:
public int compareTo(String anotherString) {
int len1 = value.length;
int len2 = anotherString.value.length;
int lim = Math.min(len1, len2);
char v1[] = value;
char v2[] = anotherString.value;
int k = 0;
while (k < lim) {
char c1 = v1[k];
char c2 = v2[k];
if (c1 != c2) {
return c1 - c2;
}
k++;
}
return len1 - len2;
}
Keep in mind that value
is the char
array backing the string.
请记住,这value
是char
支持字符串的数组。
private final char value[];
So how does this method proceed ?
那么这个方法是如何进行的呢?
- You retrieve the minimum of both string length in a variable
lim
. - You create a copy of both string char array.
- You loop over each characters (verifying if they are equals) until reaching the lowest limit.
- If two characters at same index are not equals, you return the result of substracting the second one to the first. The
char
can be represented asint
value (which take their ascii value) and are already ordered. Thus when substracting a negative number will be returned if the second char is "higher" then the first one. A positive will be returned if the second char is "lower" then the first one. 0 will be returned if both are equals. - If all characters were equals while looping for the lowest string length, you return a substraction of both length.
- 您在变量中检索两个字符串长度中的最小值
lim
。 - 您创建了两个字符串字符数组的副本。
- 您遍历每个字符(验证它们是否相等)直到达到最低限制。
- 如果同一索引处的两个字符不相等,则返回将第二个字符减去第一个字符的结果。的
char
可被表示为int
值(取它们的ASCII值)和已经订购。因此,如果第二个字符比第一个字符“更高”,则在减去负数时将返回负数。如果第二个字符比第一个字符“低”,则将返回正数。如果两者相等,则返回 0。 - 如果在循环最低字符串长度时所有字符都相等,则返回两个长度的减法。
In your example, first letter of both words are not equals so you get to compare D
with c
which are respectively represented as 68
and 99
. Substract 99 to 68 and you get -31.
在您的示例中,两个单词的第一个字母不相等,因此您可以D
与c
分别表示为68
and 的单词进行比较99
。将 99 减去 68,得到 -31。
So to answer this question :
所以要回答这个问题:
Does this mean that the int returned is the number of places away the strings are form one another if they were to be sorted alphabetically like a dictionary?
这是否意味着如果字符串像字典一样按字母顺序排序,则返回的 int 是字符串彼此形成的位置数?
No, it is actually either the difference between two non-matching char's ascii value or the difference of both length.
不,它实际上是两个不匹配字符的 ascii 值之间的差异或两者长度的差异。
Also, how does the method deal with case sensitivity? Are lower case letters first in line before uppercase? Is there a chart for this?
另外,该方法如何处理区分大小写的问题?小写字母在大写之前排在第一位吗?有图表吗?
If you want to ignore the case when comparing, you can use String#compareToIgnoreCase
.
如果你想在比较时忽略大小写,你可以使用String#compareToIgnoreCase
.
Also you can check this chartfor ascii values (upper and lower case).
您也可以检查此图表的 ascii 值(大写和小写)。
回答by Kylar
I found Wikipedia's Definition of Lexicographical ordervery useful in answering your question.
我发现维基百科的字典顺序定义在回答你的问题时非常有用。
In a simplistic manner, the comparison is a numeric result from doing an alphabeticcomparison. In alphabetic comparison, we compare the ordered set of letters that make up a sequence (usually words or strings). The return value will be 0 if the two are equal, and < or > depending on which value is alphabetically before or after the other.
以简单的方式,比较是进行字母比较的数字结果 。在字母比较中,我们比较组成序列(通常是单词或字符串)的有序字母集。如果两者相等,则返回值将为 0,并且 < 或 > 取决于哪个值按字母顺序排列在另一个之前或之后。
take a list of words:
拿一个单词列表:
- cat
- dog
- animal
- aardvark
- 猫
- 狗
- 动物
- 土豚
If we compare these, we take the first character of each and look. When we compare 'cat' and 'dog', we take the first character 'c' and 'd' and compare them. Numerically in code, the easy(not necessarily best) way to do this is to convert them to a numeric value and subtract one value from the other. This will equal 0 if they are the same, and we'll go to comparing the next character in each. If they're different, then we know that one is lexicographically (alphabetically) after the other.
如果我们比较这些,我们取每个字符的第一个字符并查看。当我们比较 'cat' 和 'dog' 时,我们取第一个字符 'c' 和 'd' 并比较它们。在代码中,最简单的(不一定是最好的)方法是将它们转换为数值并从另一个值中减去一个值。如果它们相同,这将等于 0,我们将比较每个字符中的下一个字符。如果它们不同,那么我们就知道一个按字典顺序(按字母顺序)一个接一个。
The return value is not requiredto have any insightful information. That's why the only values that mean anything are <0 , ==0, and >0.
返回值不需要有任何有见地的信息。这就是为什么唯一的值是<0 、 ==0 和 >0。
With regards to casing, that is an implementation detail - There are comparators that will consider an upper case 'A' to be the same as a lower case 'a', and there are comparators that don't, since they have different numerical values. (See: How to sort alphabetically while ignoring case sensitive?).
关于大小写,这是一个实现细节 - 有些比较器会认为大写“A”与小写“a”相同,而有些比较器则不然,因为它们具有不同的数值. (请参阅:如何在忽略大小写敏感的情况下按字母顺序排序?)。