Java中的自然排序顺序字符串比较 - 是内置的吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1262239/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-12 07:32:00  来源:igfitidea点击:

Natural sort order string comparison in Java - is one built in?

javaalgorithmcomparatornatural-sort

提问by Kip

I'd like some kind of string comparison function that preserves natural sort order1. Is there anything like this built into Java? I can't find anything in the String class, and the Comparator classonly knows of two implementations.

我想要某种保留自然排序顺序的字符串比较函数1。Java中是否有类似的东西?我在String 类中找不到任何东西,而Comparator 类只知道两个实现。

I can roll my own (it's not a very hard problem), but I'd rather not re-invent the wheel if I don't have to.

我可以自己动手(这不是一个非常困难的问题),但如果不需要,我宁愿不重新发明轮子。

In my specific case, I have software version strings that I want to sort. So I want "1.2.10.5" to be considered greater than "1.2.9.1".

在我的特定情况下,我有要排序的软件版本字符串。所以我希望“1.2.10.5”被认为大于“1.2.9.1”。



1By "natural" sort order, I mean it compares strings the way a human would compare them, as opposed to "ascii-betical" sort ordering that only makes sense to programmers. In other words, "image9.jpg" is less than "image10.jpg", and "album1set2page9photo1.jpg" is less than "album1set2page10photo5.jpg", and "1.2.9.1" is less than "1.2.10.5"

1通过“自然”排序顺序,我的意思是它以人类比较字符串的方式比较字符串,而不是仅对程序员有意义的“ascii-betical”排序顺序。换句话说,“image9.jpg”小于“image10.jpg”,“album1set2page9photo1.jpg”小于“album1set2page10photo5.jpg”,“1.2.9.1”小于“1.2.10.5”

采纳答案by OscarRyz

In java the "natural" order meaning is "lexicographical" order, so there is no implementation in the core like the one you're looking for.

在 Java 中,“自然”顺序的含义是“词典”顺序,因此在核心中没有像您正在寻找的那样的实现。

There are open source implementations.

有开源实现。

Here's one:

这是一个:

NaturalOrderComparator.java

NaturalOrderComparator.java

Make sure you read the:

请务必阅读以下内容:

Cougaar Open Source License

Cougaar 开源许可证

I hope this helps!

我希望这有帮助!

回答by Yishai

String implements Comparable, and that is what natural ordering is in Java (comparing using the comparable interface). You can put the strings in a TreeSet or sort using the Collections or Arrays classes.

String 实现了 Comparable,这就是 Java 中的自然排序(使用可比较接口进行比较)。您可以将字符串放入 TreeSet 或使用 Collections 或 Arrays 类进行排序。

However, in your case you don't want "natural ordering" you really want a custom comparator, which you can then use in the Collections.sort method or the Arrays.sort method that takes a comparator.

但是,在您的情况下,您不想要“自然排序”,您确实需要一个自定义比较器,然后您可以在 Collections.sort 方法或采用比较器的 Arrays.sort 方法中使用它。

In terms of the specific logic you are looking for implementing within the comparator, (numbers separated by dots) I'm not aware of any existing standard implementations of that, but as you said, it is not a hard problem.

就您要在比较器中实现的特定逻辑而言,(用点分隔的数字)我不知道任何现有的标准实现,但正如您所说,这不是一个难题。

EDIT: In your comment, your link gets you here, which does a decent job if you don't mind the fact that it is case sensitive. Here is that code modified to allow you to pass in the String.CASE_INSENSITIVE_ORDER:

编辑:在您的评论中,您的链接将您带到这里,如果您不介意它区分大小写这一事实,它会做得很好。这是修改后的代码,允许您传入String.CASE_INSENSITIVE_ORDER

    /*
     * The Alphanum Algorithm is an improved sorting algorithm for strings
     * containing numbers.  Instead of sorting numbers in ASCII order like
     * a standard sort, this algorithm sorts numbers in numeric order.
     *
     * The Alphanum Algorithm is discussed at http://www.DaveKoelle.com
     *
     *
     * This library is free software; you can redistribute it and/or
     * modify it under the terms of the GNU Lesser General Public
     * License as published by the Free Software Foundation; either
     * version 2.1 of the License, or any later version.
     *
     * This library is distributed in the hope that it will be useful,
     * but WITHOUT ANY WARRANTY; without even the implied warranty of
     * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
     * Lesser General Public License for more details.
     *
     * You should have received a copy of the GNU Lesser General Public
     * License along with this library; if not, write to the Free Software
     * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
     *
     */

    import java.util.Comparator;

    /**
     * This is an updated version with enhancements made by Daniel Migowski,
     * Andre Bogus, and David Koelle
     *
     * To convert to use Templates (Java 1.5+):
     *   - Change "implements Comparator" to "implements Comparator<String>"
     *   - Change "compare(Object o1, Object o2)" to "compare(String s1, String s2)"
     *   - Remove the type checking and casting in compare().
     *
     * To use this class:
     *   Use the static "sort" method from the java.util.Collections class:
     *   Collections.sort(your list, new AlphanumComparator());
     */
    public class AlphanumComparator implements Comparator<String>
    {
        private Comparator<String> comparator = new NaturalComparator();

        public AlphanumComparator(Comparator<String> comparator) {
            this.comparator = comparator;
        }

        public AlphanumComparator() {

        }

        private final boolean isDigit(char ch)
        {
            return ch >= 48 && ch <= 57;
        }

        /** Length of string is passed in for improved efficiency (only need to calculate it once) **/
        private final String getChunk(String s, int slength, int marker)
        {
            StringBuilder chunk = new StringBuilder();
            char c = s.charAt(marker);
            chunk.append(c);
            marker++;
            if (isDigit(c))
            {
                while (marker < slength)
                {
                    c = s.charAt(marker);
                    if (!isDigit(c))
                        break;
                    chunk.append(c);
                    marker++;
                }
            } else
            {
                while (marker < slength)
                {
                    c = s.charAt(marker);
                    if (isDigit(c))
                        break;
                    chunk.append(c);
                    marker++;
                }
            }
            return chunk.toString();
        }

        public int compare(String s1, String s2)
        {

            int thisMarker = 0;
            int thatMarker = 0;
            int s1Length = s1.length();
            int s2Length = s2.length();

            while (thisMarker < s1Length && thatMarker < s2Length)
            {
                String thisChunk = getChunk(s1, s1Length, thisMarker);
                thisMarker += thisChunk.length();

                String thatChunk = getChunk(s2, s2Length, thatMarker);
                thatMarker += thatChunk.length();

                // If both chunks contain numeric characters, sort them numerically
                int result = 0;
                if (isDigit(thisChunk.charAt(0)) && isDigit(thatChunk.charAt(0)))
                {
                    // Simple chunk comparison by length.
                    int thisChunkLength = thisChunk.length();
                    result = thisChunkLength - thatChunk.length();
                    // If equal, the first different number counts
                    if (result == 0)
                    {
                        for (int i = 0; i < thisChunkLength; i++)
                        {
                            result = thisChunk.charAt(i) - thatChunk.charAt(i);
                            if (result != 0)
                            {
                                return result;
                            }
                        }
                    }
                } else
                {
                    result = comparator.compare(thisChunk, thatChunk);
                }

                if (result != 0)
                    return result;
            }

            return s1Length - s2Length;
        }

        private static class NaturalComparator implements Comparator<String> {
            public int compare(String o1, String o2) {
                return o1.compareTo(o2);
            }
        }
    }

回答by Mikhail

I have tested three Java implementations mentioned here by others and found that their work slightly differently but none as I would expect.

我已经测试了其他人在这里提到的三个 Java 实现,发现它们的工作略有不同,但没有像我期望的那样。

Both AlphaNumericStringComparatorand AlphanumComparatordo not ignore whitespaces so that pic2is placed before pic 1.

无论AlphaNumericStringComparatorAlphanumComparator,这样才不会忽略空格pic2放置之前pic 1

On the other hand NaturalOrderComparatorignores not only whitespaces but also all leading zeros so that sig[1]precedes sig[0].

另一方面,NaturalOrderComparator不仅忽略空格,还会忽略所有前导零,以便sig[1]sig[0].

Regarding performance AlphaNumericStringComparatoris ~x10 slower then the other two.

关于性能AlphaNumericStringComparator比其他两个慢~x10。

回答by bennidi

How about using the split() method from String, parse the single numeric string and then compare them one by one?

如何使用 String 中的 split() 方法,解析单个数字字符串,然后将它们一一比较?

 @Test
public void test(){
    System.out.print(compare("1.12.4".split("\."), "1.13.4".split("\."),0));
}


public static int compare(String[] arr1, String[] arr2, int index){
    // if arrays do not have equal size then and comparison reached the upper bound of one of them
    // then the longer array is considered the bigger ( --> 2.2.0 is bigger then 2.2)
    if(arr1.length <= index || arr2.length <= index) return arr1.length - arr2.length;
    int result = Integer.parseInt(arr1[index]) - Integer.parseInt(arr2[index]);
    return result == 0 ?  compare(arr1, arr2, ++index) : result;
}

I did not check the corner cases but that should work and it's quite compact

我没有检查角落情况,但这应该可以工作,而且非常紧凑

回答by xtf

It concats the digits, then compares it. And if it's not applicable it continues.

它连接数字,然后比较它。如果它不适用,它会继续。

public int compare(String o1, String o2) {
if(o1 == null||o2 == null)
    return 0;
for(int i = 0; i<o1.length()&&i<o2.length();i++){
    if(Character.isDigit(o1.charAt(i)) || Character.isDigit(o2.charAt(i)))
    {
    String dig1 = "",dig2 = "";     
    for(int x = i; x<o1.length() && Character.isDigit(o1.charAt(i)); x++){                              
        dig1+=o1.charAt(x);
    }
    for(int x = i; x<o2.length() && Character.isDigit(o2.charAt(i)); x++){
        dig2+=o2.charAt(x);
    }
    if(Integer.valueOf(dig1) < Integer.valueOf(dig2))
        return -1;
    if(Integer.valueOf(dig1) > Integer.valueOf(dig2))
        return 1;
    }       
if(o1.charAt(i)<o2.charAt(i))
    return -1;
if(o1.charAt(i)>o2.charAt(i))
    return 1;
}
return 0;

}

}

回答by Kannan Ramamoorthy

Might be a late reply. But my answer can help someone else who needs a comparator like this.

可能回复晚了。但是我的回答可以帮助需要这样的比较器的其他人。

I verified couple of other comparators too. But mine seems bit efficient than others I compared. Also tried the one that Yishai has posted. Mine is taking only half of the time as the mentioned one for data of alphanumeric data set of 100 entries.

我也验证了其他几个比较器。但我的似乎比我比较的其他人有效。也试过 Yishai 贴的那个。对于 100 个条目的字母数字数据集,我的时间仅为上述时间的一半。

/**
 * Sorter that compares the given Alpha-numeric strings. This iterates through each characters to
 * decide the sort order. There are 3 possible cases while iterating,
 * 
 * <li>If both have same non-digit characters then the consecutive characters will be considered for
 * comparison.</li>
 * 
 * <li>If both have numbers at the same position (with/without non-digit characters) the consecutive
 * digit characters will be considered to form the valid integer representation of the characters
 * will be taken and compared.</li>
 * 
 * <li>At any point if the comparison gives the order(either > or <) then the consecutive characters
 * will not be considered.</li>
 * 
 * For ex., this will be the ordered O/P of the given list of Strings.(The bold characters decides
 * its order) <i><b>2</b>b,<b>100</b>b,a<b>1</b>,A<b>2</b>y,a<b>100</b>,</i>
 * 
 * @author kannan_r
 * 
 */
class AlphaNumericSorter implements Comparator<String>
{
    /**
     * Does the Alphanumeric sort of the given two string
     */
    public int compare(String theStr1, String theStr2)
    {
        char[] theCharArr1 = theStr1.toCharArray();
        char[] theCharArr2 = theStr2.toCharArray();
        int aPosition = 0;
        if (Character.isDigit(theCharArr1[aPosition]) && Character.isDigit(theCharArr2[aPosition]))
        {
            return sortAsNumber(theCharArr1, theCharArr2, aPosition++ );
        }
        return sortAsString(theCharArr1, theCharArr2, 0);
    }

    /**
     * Sort the given Arrays as string starting from the given position. This will be a simple case
     * insensitive sort of each characters. But at any given position if there are digits in both
     * arrays then the method sortAsNumber will be invoked for the given position.
     * 
     * @param theArray1 The first character array.
     * @param theArray2 The second character array.
     * @param thePosition The position starting from which the calculation will be done.
     * @return positive number when the Array1 is greater than Array2<br/>
     *         negative number when the Array2 is greater than Array1<br/>
     *         zero when the Array1 is equal to Array2
     */
    private int sortAsString(char[] theArray1, char[] theArray2, int thePosition)
    {
        int aResult = 0;
        if (thePosition < theArray1.length && thePosition < theArray2.length)
        {
            aResult = (int)theArray1[thePosition] - (int)theArray2[thePosition];
            if (aResult == 0)
            {
                ++thePosition;
                if (thePosition < theArray1.length && thePosition < theArray2.length)
                {
                    if (Character.isDigit(theArray1[thePosition]) && Character.isDigit(theArray2[thePosition]))
                    {
                        aResult = sortAsNumber(theArray1, theArray2, thePosition);
                    }
                    else
                    {
                        aResult = sortAsString(theArray1, theArray2, thePosition);
                    }
                }
            }
        }
        else
        {
            aResult = theArray1.length - theArray2.length;
        }
        return aResult;
    }

    /**
     * Sorts the characters in the given array as number starting from the given position. When
     * sorted as numbers the consecutive characters starting from the given position upto the first
     * non-digit character will be considered.
     * 
     * @param theArray1 The first character array.
     * @param theArray2 The second character array.
     * @param thePosition The position starting from which the calculation will be done.
     * @return positive number when the Array1 is greater than Array2<br/>
     *         negative number when the Array2 is greater than Array1<br/>
     *         zero when the Array1 is equal to Array2
     */
    private int sortAsNumber(char[] theArray1, char[] theArray2, int thePosition)
    {
        int aResult = 0;
        int aNumberInStr1;
        int aNumberInStr2;
        if (thePosition < theArray1.length && thePosition < theArray2.length)
        {
            if (Character.isDigit(theArray1[thePosition]) && Character.isDigit(theArray1[thePosition]))
            {
                aNumberInStr1 = getNumberInStr(theArray1, thePosition);
                aNumberInStr2 = getNumberInStr(theArray2, thePosition);

                aResult = aNumberInStr1 - aNumberInStr2;

                if (aResult == 0)
                {
                    thePosition = getNonDigitPosition(theArray1, thePosition);
                    if (thePosition != -1)
                    {
                        aResult = sortAsString(theArray1, theArray2, thePosition);
                    }
                }
            }
            else
            {
                aResult = sortAsString(theArray1, theArray2, ++thePosition);
            }
        }
        else
        {
            aResult = theArray1.length - theArray2.length;
        }
        return aResult;
    }

    /**
     * Gets the position of the non digit character in the given array starting from the given
     * position.
     * 
     * @param theCharArr /the character array.
     * @param thePosition The position after which the array need to be checked for non-digit
     *        character.
     * @return The position of the first non-digit character in the array.
     */
    private int getNonDigitPosition(char[] theCharArr, int thePosition)
    {
        for (int i = thePosition; i < theCharArr.length; i++ )
        {
            if ( !Character.isDigit(theCharArr[i]))
            {
                return i;
            }
        }
        return -1;
    }

    /**
     * Gets the integer value of the number starting from the given position of the given array.
     * 
     * @param theCharArray The character array.
     * @param thePosition The position form which the number need to be calculated.
     * @return The integer value of the number.
     */
    private int getNumberInStr(char[] theCharArray, int thePosition)
    {
        int aNumber = 0;
        for (int i = thePosition; i < theCharArray.length; i++ )
        {
            if(!Character.isDigit(theCharArray[i]))
            {
               return aNumber;
            }
            aNumber += aNumber * 10 + (theCharArray[i] - 48);
        }
        return aNumber;
    }
}

回答by Panayotis

Have a look at this implementation. It should be as fast as possible, without any regular expressions or array manipulation or method calls, just a couple of flags and a lot of cases.

看看这个实现。它应该尽可能快,没有任何正则表达式或数组操作或方法调用,只有几个标志和很多情况。

This should sort any combination of numbers inside strings and properly support numbers which are equal and move on.

这应该对字符串内的任何数字组合进行排序,并正确支持相等并继续前进的数字。

public static int naturalCompare(String a, String b, boolean ignoreCase) {
    if (ignoreCase) {
        a = a.toLowerCase();
        b = b.toLowerCase();
    }
    int aLength = a.length();
    int bLength = b.length();
    int minSize = Math.min(aLength, bLength);
    char aChar, bChar;
    boolean aNumber, bNumber;
    boolean asNumeric = false;
    int lastNumericCompare = 0;
    for (int i = 0; i < minSize; i++) {
        aChar = a.charAt(i);
        bChar = b.charAt(i);
        aNumber = aChar >= '0' && aChar <= '9';
        bNumber = bChar >= '0' && bChar <= '9';
        if (asNumeric)
            if (aNumber && bNumber) {
                if (lastNumericCompare == 0)
                    lastNumericCompare = aChar - bChar;
            } else if (aNumber)
                return 1;
            else if (bNumber)
                return -1;
            else if (lastNumericCompare == 0) {
                if (aChar != bChar)
                    return aChar - bChar;
                asNumeric = false;
            } else
                return lastNumericCompare;
        else if (aNumber && bNumber) {
            asNumeric = true;
            if (lastNumericCompare == 0)
                lastNumericCompare = aChar - bChar;
        } else if (aChar != bChar)
            return aChar - bChar;
    }
    if (asNumeric)
        if (aLength > bLength && a.charAt(bLength) >= '0' && a.charAt(bLength) <= '9') // as number
            return 1;  // a has bigger size, thus b is smaller
        else if (bLength > aLength && b.charAt(aLength) >= '0' && b.charAt(aLength) <= '9') // as number
            return -1;  // b has bigger size, thus a is smaller
        else if (lastNumericCompare == 0)
          return aLength - bLength;
        else
            return lastNumericCompare;
    else
        return aLength - bLength;
}

回答by rednoah

Using RuleBasedCollatormight be an option as well. Though you'd have to add all the sort order rules in advance so it's not a good solution if you want to take larger numbers into account as well.

使用RuleBasedCollator也可能是一种选择。尽管您必须提前添加所有排序规则,因此如果您还想考虑更大的数字,这不是一个好的解决方案。

Adding specific customizations such as 2 < 10is quite easy though and might be useful for sorting special version identifiers like Trusty < Precise < Xenial < Yakkety.

添加特定的自定义设置2 < 10非常容易,并且可能对排序特殊版本标识符(如Trusty < Precise < Xenial < Yakkety.

RuleBasedCollator localRules = (RuleBasedCollator) Collator.getInstance();

String extraRules = IntStream.range(0, 100).mapToObj(String::valueOf).collect(joining(" < "));
RuleBasedCollator c = new RuleBasedCollator(localRules.getRules() + " & " + extraRules);

List<String> a = asList("1-2", "1-02", "1-20", "10-20", "fred", "jane", "pic01", "pic02", "pic02a", "pic 5", "pic05", "pic   7", "pic100", "pic100a", "pic120", "pic121");
shuffle(a);

a.sort(c);
System.out.println(a);