字节数组的 Java 比较器(字典)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5108091/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 09:28:56  来源:igfitidea点击:

Java Comparator for byte array (lexicographic)

javasortingcollectionsmapcompare

提问by marcorossi

I have a hashmap with byte[] keys. I'd like to sort it through a TreeMap.

我有一个带有 byte[] 键的哈希图。我想通过 TreeMap 对其进行排序。

What is the most effective way to implement the comparator for lexicographic order?

实现字典顺序比较器的最有效方法是什么?

回答by ColinD

Using Guava, you can use either of:

使用Guava,您可以使用以下任一项:

The UnsignedBytescomparator appears to have an optimized form using Unsafethat it uses if it can. Comments in the code indicate that it may be at least twice as fast as a normal Java implementation.

UnsignedBytes比较器似乎具有使用优化的形式Unsafe,它采用,如果它可以。代码中的注释表明它可能至少是普通 Java 实现的两倍。

回答by marcorossi

Found this nice piece of code in Apache Hbase:

在 Apache Hbase 中找到了这段不错的代码:

    public int compare(byte[] left, byte[] right) {
        for (int i = 0, j = 0; i < left.length && j < right.length; i++, j++) {
            int a = (left[i] & 0xff);
            int b = (right[j] & 0xff);
            if (a != b) {
                return a - b;
            }
        }
        return left.length - right.length;
    }

回答by Julius Musseau

I'm assuming the problem is just with the "byte vs. byte" comparison. Dealing with the arrays is straightforward, so I won't cover it. With respect to byte vs. byte, my first thought is to do this:

我假设问题仅在于“字节与字节”的比较。处理数组很简单,所以我不会介绍它。关于字节与字节,我的第一个想法是这样做:

public class ByteComparator implements Comparator<byte> {
  public int compare(byte b1, byte b2) {
    return new Byte(b1).compareTo(b2);
  }
}

But that won't be lexicographic: 0xFF (the signed byte for -1) will be considered smaller than 0x00, when lexicographically it's bigger. I think this should do the trick:

但这不会是按字典顺序排列的:0xFF(-1 的有符号字节)将被认为小于 0x00,当按字典顺序排列时它更大。我认为这应该可以解决问题:

public class ByteComparator implements Comparator<byte> {
  public int compare(byte b1, byte b2) {
    // convert to unsigned bytes (0 to 255) before comparing them.
    int i1 = b1 < 0 ? 256 + b1 : b1;
    int i2 = b2 < 0 ? 256 + b2 : b2;
    return i2 - i1;
  }
}

Probably there is something in Apache's commons-lang or commons-math libraries that does this, but I don't know it off hand.

Apache 的 commons-lang 或 commons-math 库中可能有一些东西可以做到这一点,但我不知道。

回答by Peter Lawrey

You can use a comparator which comares the Character.toLowerCase() of each of the bytes in the array (Assuming the byte[] is in ASCII) if not you will need to do the character decoding yourself or use new String(bytes, charSet).toLowerCase()but this is not likely to be efficient.

您可以使用比较器,它对数组中每个字节的 Character.toLowerCase() 进行比较(假设 byte[] 是 ASCII),如果不是,您将需要自己进行字符解码或使用,new String(bytes, charSet).toLowerCase()但这不太可能高效。