Java 对可能包含数字的字符串进行排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/104599/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Sort on a string that may contain a number
提问by Paul Tomblin
I need to write a Java Comparator class that compares Strings, however with one twist. If the two strings it is comparing are the same at the beginning and end of the string are the same, and the middle part that differs is an integer, then compare based on the numeric values of those integers. For example, I want the following strings to end up in order they're shown:
我需要编写一个比较字符串的 Java Comparator 类,但是有一个转折。如果它比较的两个字符串开头相同,结尾相同,中间不同的部分是一个整数,则根据这些整数的数值进行比较。例如,我希望以下字符串以显示顺序结束:
- aaa
- bbb 3 ccc
- bbb 12 ccc
- ccc 11
- ddd
- eee 3 ddd jpeg2000 eee
- eee 12 ddd jpeg2000 eee
- 啊啊啊
- bbb 3 ccc
- bbb 12 ccc
- 抄送 11
- 滴滴
- eee 3 ddd jpeg2000 eee
- eee 12 ddd jpeg2000 eee
As you can see, there might be other integers in the string, so I can't just use regular expressions to break out any integer. I'm thinking of just walking the strings from the beginning until I find a bit that doesn't match, then walking in from the end until I find a bit that doesn't match, and then comparing the bit in the middle to the regular expression "[0-9]+", and if it compares, then doing a numeric comparison, otherwise doing a lexical comparison.
如您所见,字符串中可能还有其他整数,因此我不能仅使用正则表达式来分解任何整数。我正在考虑从一开始就遍历字符串,直到找到不匹配的位,然后从结尾开始直到找到不匹配的位,然后将中间的位与正则表达式“[0-9]+”,如果比较,则进行数值比较,否则进行词法比较。
Is there a better way?
有没有更好的办法?
UpdateI don't think I can guarantee that the other numbers in the string, the ones that may match, don't have spaces around them, or that the ones that differ do have spaces.
更新我认为我不能保证字符串中的其他数字,可能匹配的数字,周围没有空格,或者不同的数字确实有空格。
采纳答案by ScArcher2
From the website
从网站
"People sort strings with numbers differently than software. Most sorting algorithms compare ASCII values, which produces an ordering that is inconsistent with human logic. Here's how to fix it."
“人们用数字对字符串进行排序与软件不同。大多数排序算法比较 ASCII 值,这会产生与人类逻辑不一致的排序。这是修复它的方法。”
Edit: Here's a link to the Java Comparator Implementationfrom that site.
编辑:这是该站点的Java 比较器实现的链接。
回答by John Millikin
Split the string into runs of letters and numbers, so "foo 12 bar" becomes the list ("foo", 12, "bar"), then use the list as the sort key. This way the numbers will be ordered in numerical order, not alphabetical.
将字符串拆分为字母和数字,因此 "foo 12 bar" 成为列表 ("foo", 12, "bar"),然后使用该列表作为排序键。这样,数字将按数字顺序排列,而不是按字母顺序排列。
回答by scubabbl
In your given example, the numbers you want to compare have spaces around them while the other numbers do not, so why would a regular expression not work?
在您给定的示例中,您要比较的数字周围有空格,而其他数字则没有,那么为什么正则表达式不起作用?
bbb 12ccc
bbb 12ccc
vs.
对比
eee 12 ddd jpeg2000eee
eee 12 ddd jpeg2000eee
回答by sblundy
I think you'll have to do the comparison on a character-by-character fashion. Grab a character, if it's a number character, keep grabbing, then reassemble to characters into a single number string and convert it into an int
. Repeat on the other string, and only then do the comparison.
我认为您必须对逐个字符的时尚进行比较。抓取一个字符,如果是数字字符,继续抓取,然后将字符重新组合成单个数字字符串并转换为int
. 在另一个字符串上重复,然后才进行比较。
回答by Eclipse
回答by Ray Hayes
Ian Griffiths of Microsoft has a C# implementation he calls Natural Sorting. Porting to Java should be fairly easy, easier than from C anyway!
微软的 Ian Griffiths 有一个他称之为Natural Sorting的 C# 实现。移植到 Java 应该相当容易,比从 C 移植更容易!
UPDATE:There seems to be a Java example on eekboomthat does this, see the "compareNatural" and use that as your comparer to sorts.
更新:eekboom上似乎有一个 Java 示例可以执行此操作,请参阅“compareNatural”并将其用作您的比较器进行排序。
回答by Ray Hayes
If you're writing a comparator class, you should implement your own compare method that will compare two strings character by character. This compare method should check if you're dealing with alphabetic characters, numeric characters, or mixed types (including spaces). You'll have to define how you want a mixed type to act, whether numbers come before alphabetic characters or after, and where spaces fit in etc.
如果您正在编写一个比较器类,您应该实现自己的比较方法,该方法将一个字符一个字符地比较两个字符串。此比较方法应检查您是否正在处理字母字符、数字字符或混合类型(包括空格)。您必须定义混合类型的行为方式,数字是在字母字符之前还是之后,以及空格适合的位置等。
回答by James Antill
On Linux glibc provides strverscmp(), it's also available from gnulib for portability. However truly "human" sorting has lots of other quirks like "The Beatles" being sorted as "Beatles, The". There is no simple solution to this generic problem.
在 Linux 上,glibc 提供 strverscmp(),它也可从 gnulib 获得以实现可移植性。然而,真正的“人类”排序还有很多其他的怪癖,比如“披头士乐队”被归类为“披头士乐队”。这个通用问题没有简单的解决方案。
回答by Paul Brinkley
Short answer: based on the context, I can't tell whether this is just some quick-and-dirty code for personal use, or a key part of Goldman Sachs' latest internal accounting software, so I'll open by saying: eww. That's a rather funky sorting algorithm; try to use something a bit less "twisty" if you can.
简短回答:根据上下文,我无法判断这是否只是一些供个人使用的快速而肮脏的代码,还是高盛最新内部会计软件的关键部分,所以我会说:eww . 这是一个相当时髦的排序算法;如果可以,请尝试使用不那么“曲折”的东西。
Long answer:
长答案:
The two issues that immediately come to mind in your case are performance, and correctness. Informally, make sure it's fast, and make sure your algorithm is a total ordering.
在您的案例中,立即想到的两个问题是性能和正确性。非正式地,确保它是快速的,并确保你的算法是一个总排序。
(Of course, if you're not sorting more than about 100 items, you can probably disregard this paragraph.) Performance matters, as the speed of the comparator will be the largest factor in the speed of your sort (assuming the sort algorithm is "ideal" to the typical list). In your case, the comparator's speed will depend mainly on the size of the string. The strings seem to be fairly short, so they probably won't dominate as much as the size of your list.
(当然,如果您排序的项目不超过 100 个,您可能可以忽略此段。)性能很重要,因为比较器的速度将是排序速度的最大因素(假设排序算法是典型列表的“理想”)。在您的情况下,比较器的速度主要取决于字符串的大小。字符串似乎相当短,因此它们可能不会像您的列表大小那样占据主导地位。
Turning each string into a string-number-string tuple and then sorting this list of tuples, as suggested in another answer, will fail in some of your cases, since you apparently will have strings with multiple numbers appearing.
将每个字符串转换为字符串-数字-字符串元组,然后按照另一个答案中的建议对该元组列表进行排序,在您的某些情况下会失败,因为您显然会有多个数字出现的字符串。
The other problem is correctness. Specifically, if the algorithm you described will ever permit A > B > ... > A, then your sort will be non-deterministic. In your case, I fear that it might, though I can't prove it. Consider some parsing cases such as:
另一个问题是正确性。具体来说,如果您描述的算法将允许 A > B > ... > A,那么您的排序将是不确定的。在你的情况下,我担心它可能会,尽管我无法证明。考虑一些解析情况,例如:
aa 0 aa
aa 23aa
aa 2a3aa
aa 113aa
aa 113 aa
a 1-2 a
a 13 a
a 12 a
a 2-3 a
a 21 a
a 2.3 a
回答by PhiLho
Interesting little challenge, I enjoyed solving it.
有趣的小挑战,我喜欢解决它。
Here is my take at the problem:
这是我对这个问题的看法:
String[] strs =
{
"eee 5 ddd jpeg2001 eee",
"eee 123 ddd jpeg2000 eee",
"ddd",
"aaa 5 yy 6",
"ccc 555",
"bbb 3 ccc",
"bbb 9 a",
"",
"eee 4 ddd jpeg2001 eee",
"ccc 11",
"bbb 12 ccc",
"aaa 5 yy 22",
"aaa",
"eee 3 ddd jpeg2000 eee",
"ccc 5",
};
Pattern splitter = Pattern.compile("(\d+|\D+)");
public class InternalNumberComparator implements Comparator
{
public int compare(Object o1, Object o2)
{
// I deliberately use the Java 1.4 syntax,
// all this can be improved with 1.5's generics
String s1 = (String)o1, s2 = (String)o2;
// We split each string as runs of number/non-number strings
ArrayList sa1 = split(s1);
ArrayList sa2 = split(s2);
// Nothing or different structure
if (sa1.size() == 0 || sa1.size() != sa2.size())
{
// Just compare the original strings
return s1.compareTo(s2);
}
int i = 0;
String si1 = "";
String si2 = "";
// Compare beginning of string
for (; i < sa1.size(); i++)
{
si1 = (String)sa1.get(i);
si2 = (String)sa2.get(i);
if (!si1.equals(si2))
break; // Until we find a difference
}
// No difference found?
if (i == sa1.size())
return 0; // Same strings!
// Try to convert the different run of characters to number
int val1, val2;
try
{
val1 = Integer.parseInt(si1);
val2 = Integer.parseInt(si2);
}
catch (NumberFormatException e)
{
return s1.compareTo(s2); // Strings differ on a non-number
}
// Compare remainder of string
for (i++; i < sa1.size(); i++)
{
si1 = (String)sa1.get(i);
si2 = (String)sa2.get(i);
if (!si1.equals(si2))
{
return s1.compareTo(s2); // Strings differ
}
}
// Here, the strings differ only on a number
return val1 < val2 ? -1 : 1;
}
ArrayList split(String s)
{
ArrayList r = new ArrayList();
Matcher matcher = splitter.matcher(s);
while (matcher.find())
{
String m = matcher.group(1);
r.add(m);
}
return r;
}
}
Arrays.sort(strs, new InternalNumberComparator());
This algorithm need much more testing, but it seems to behave rather nicely.
这个算法需要更多的测试,但它似乎表现得相当不错。
[EDIT] I added some more comments to be clearer. I see there are much more answers than when I started to code this... But I hope I provided a good starting base and/or some ideas.
[编辑] 我添加了一些更清晰的评论。我发现答案比我开始编写代码时要多得多……但我希望我提供了一个良好的起点和/或一些想法。