字节数组的 Java 比较器(字典)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5108091/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Java Comparator for byte array (lexicographic)
提问by marcorossi
I have a hashmap with byte[] keys. I'd like to sort it through a TreeMap.
我有一个带有 byte[] 键的哈希图。我想通过 TreeMap 对其进行排序。
What is the most effective way to implement the comparator for lexicographic order?
实现字典顺序比较器的最有效方法是什么?
回答by ColinD
Using Guava, you can use either of:
使用Guava,您可以使用以下任一项:
The UnsignedBytes
comparator appears to have an optimized form using Unsafe
that it uses if it can. Comments in the code indicate that it may be at least twice as fast as a normal Java implementation.
该UnsignedBytes
比较器似乎具有使用优化的形式Unsafe
,它采用,如果它可以。代码中的注释表明它可能至少是普通 Java 实现的两倍。
回答by marcorossi
Found this nice piece of code in Apache Hbase:
在 Apache Hbase 中找到了这段不错的代码:
public int compare(byte[] left, byte[] right) {
for (int i = 0, j = 0; i < left.length && j < right.length; i++, j++) {
int a = (left[i] & 0xff);
int b = (right[j] & 0xff);
if (a != b) {
return a - b;
}
}
return left.length - right.length;
}
回答by Julius Musseau
I'm assuming the problem is just with the "byte vs. byte" comparison. Dealing with the arrays is straightforward, so I won't cover it. With respect to byte vs. byte, my first thought is to do this:
我假设问题仅在于“字节与字节”的比较。处理数组很简单,所以我不会介绍它。关于字节与字节,我的第一个想法是这样做:
public class ByteComparator implements Comparator<byte> {
public int compare(byte b1, byte b2) {
return new Byte(b1).compareTo(b2);
}
}
But that won't be lexicographic: 0xFF (the signed byte for -1) will be considered smaller than 0x00, when lexicographically it's bigger. I think this should do the trick:
但这不会是按字典顺序排列的:0xFF(-1 的有符号字节)将被认为小于 0x00,当按字典顺序排列时它更大。我认为这应该可以解决问题:
public class ByteComparator implements Comparator<byte> {
public int compare(byte b1, byte b2) {
// convert to unsigned bytes (0 to 255) before comparing them.
int i1 = b1 < 0 ? 256 + b1 : b1;
int i2 = b2 < 0 ? 256 + b2 : b2;
return i2 - i1;
}
}
Probably there is something in Apache's commons-lang or commons-math libraries that does this, but I don't know it off hand.
Apache 的 commons-lang 或 commons-math 库中可能有一些东西可以做到这一点,但我不知道。
回答by Peter Lawrey
You can use a comparator which comares the Character.toLowerCase() of each of the bytes in the array (Assuming the byte[] is in ASCII) if not you will need to do the character decoding yourself or use new String(bytes, charSet).toLowerCase()
but this is not likely to be efficient.
您可以使用比较器,它对数组中每个字节的 Character.toLowerCase() 进行比较(假设 byte[] 是 ASCII),如果不是,您将需要自己进行字符解码或使用,new String(bytes, charSet).toLowerCase()
但这不太可能高效。