java 有效地从 byte[] 数组中提取任意长度的位序列

Question

提问by Durandal

I'm looking for the most efficient way of extracting (unsigned) bit sequences of arbitrary length (0 <= length <= 16) at arbitrary position. The skeleton class show how my current implementation essentially handles the problem:

我正在寻找在任意位置提取任意长度（0 <= 长度 <= 16）的（无符号）位序列的最有效方法。骨架类显示了我当前的实现基本上是如何处理问题的：

public abstract class BitArray {

byte[] bytes = new byte[2048];
int bitGet;

public BitArray() {
}

public void readNextBlock(int initialBitGet, int count) {
    // substitute for reading from an input stream 
    for (int i=(initialBitGet>>3); i<=count; ++i) {
        bytes[i] = (byte) i;
    }
    prepareBitGet(initialBitGet, count);
}

public abstract void prepareBitGet(int initialBitGet, int count);

public abstract int getBits(int count);

static class Version0 extends BitArray {
    public void prepareBitGet(int initialBitGet, int count) {
        bitGet = initialBitGet;
    }

    public int getBits(int len) {
        // intentionally gives meaningless result
        bitGet += len;
        return 0;
    }
}

static class Version1 extends BitArray {
    public void prepareBitGet(int initialBitGet, int count) {
        bitGet = initialBitGet - 1;
    }

    public int getBits(int len) {
        int byteIndex = bitGet;
        bitGet = byteIndex + len;
        int shift = 23 - (byteIndex & 7) - len;
        int mask = (1 << len) - 1;
        byteIndex >>= 3;
        return (((bytes[byteIndex] << 16) | 
               ((bytes[++byteIndex] & 0xFF) <<  8) |
                (bytes[++byteIndex] & 0xFF)) >> shift) & mask;
    }
}

static class Version2 extends BitArray {
    static final int[] mask = { 0x0, 0x1, 0x3, 0x7, 0xF, 0x1F, 0x3F, 0x7F, 0xFF,
                0x1FF, 0x3FF, 0x7FF, 0xFFF, 0x1FFF, 0x3FFF, 0x7FFF, 0xFFFF };

    public void prepareBitGet(int initialBitGet, int count) {
        bitGet = initialBitGet;
    }

    public int getBits(int len) {
        int offset = bitGet;
        bitGet = offset + len;
        int byteIndex = offset >> 3; // originally used /8
        int bitIndex = offset & 7;   // originally used %8
        if ((bitIndex + len) > 16) {
            return ((bytes[byteIndex] << 16 |
                    (bytes[byteIndex + 1] & 0xFF) << 8 |
                    (bytes[byteIndex + 2] & 0xFF)) >> (24 - bitIndex - len)) & mask[len];
        } else if ((offset + len) > 8) {
            return ((bytes[byteIndex] << 8 |
                    (bytes[byteIndex + 1] & 0xFF)) >> (16 - bitIndex - len)) & mask[len];
        } else {
            return (bytes[byteIndex] >> (8 - offset - len)) & mask[len];
        }
    }
}

static class Version3 extends BitArray {
    int[] ints = new int[2048];

    public void prepareBitGet(int initialBitGet, int count) {
        bitGet = initialBitGet;
        int put_i = (initialBitGet >> 3) - 1;
        int get_i = put_i;
        int buf;
        buf = ((bytes[++get_i] & 0xFF) << 16) |
              ((bytes[++get_i] & 0xFF) <<  8) |
               (bytes[++get_i] & 0xFF);
        do {
            buf = (buf << 8) | (bytes[++get_i] & 0xFF);
            ints[++put_i] = buf;
        } while (get_i < count);
    }

    public int getBits(int len) {
        int bit_idx = bitGet;
        bitGet = bit_idx + len;
        int shift = 32 - (bit_idx & 7) - len;
        int mask = (1 << len) - 1;
        int int_idx = bit_idx >> 3;
        return (ints[int_idx] >> shift) & mask;
    }
}

static class Version4 extends BitArray {
    int[] ints = new int[1024];

    public void prepareBitGet(int initialBitGet, int count) {
        bitGet = initialBitGet;
        int g = initialBitGet >> 3;
        int p = (initialBitGet >> 4) - 1;
        final byte[] b = bytes;
        int t = (b[g]  <<  8) | (b[++g] & 0xFF);
        final int[] i = ints;
        do {
            i[++p] = (t = (t << 16) | ((b[++g] & 0xFF) <<8) | (b[++g] & 0xFF));
        } while (g < count);
    }

    public int getBits(final int len) {
        final int i;
        bitGet = (i = bitGet) + len;
        return (ints[i >> 4] >> (32 - len - (i & 15))) & ((1 << len) - 1);
    }
}

public void benchmark(String label) {
    int checksum = 0;
    readNextBlock(32, 1927);
    long time = System.nanoTime();
    for (int pass=1<<18; pass>0; --pass) {
        prepareBitGet(32, 1927);
        for (int i=2047; i>=0; --i) {
            checksum += getBits(i & 15);
        }
    }
    time = System.nanoTime() - time;
    System.out.println(label+" took "+Math.round(time/1E6D)+" ms, checksum="+checksum);
    try { // avoid having the console interfere with our next measurement
        Thread.sleep(369);
    } catch (InterruptedException e) {}
}

public static void main(String[] argv) {
    BitArray test;
    // for the sake of getting a little less influence from the OS for stable measurement
    Thread.currentThread().setPriority(Thread.MAX_PRIORITY);
    while (true) {
        test = new Version0();
        test.benchmark("no implementaion");
        test = new Version1();
        test.benchmark("Durandal's (original)");
        test = new Version2();
        test.benchmark("blitzpasta's (adapted)");
        test = new Version3();
        test.benchmark("MSN's (posted)");
        test = new Version4();
        test.benchmark("MSN's (half-buffer modification)");
        System.out.println("--- next pass ---");
    }
}
}

This works, but I'm looking for a more efficient solution (performance wise). The byte array is guaranteed to be relatively small, between a few bytes up to a max of ~1800 bytes. The array is read exactly once (completely) between each call to the read method. There is no need for any error checking in getBits(), such as exceeding the array etc.

这有效，但我正在寻找更有效的解决方案（性能方面）。字节数组保证相对较小，在几个字节到最多约 1800 个字节之间。在每次调用 read 方法之间，数组只读取一次（完全）。getBits() 中不需要任何错误检查，例如超出数组等。

It seems my initial question above isn't clear enough. A "bit sequence" of N bits forms an integer of N bits, and I need to extract those integers with minimal overhead. I have no use for strings, as the values are either used as lookup indices or are directly fed into some computation. So basically, the skeleton shown above is a real class and getBits() signature shows how the rest of the code interacts with it.

上面我最初的问题似乎还不够清楚。N 位的“位序列”形成 N 位的整数，我需要以最小的开销提取这些整数。我没有使用字符串，因为这些值要么用作查找索引，要么直接输入到某些计算中。所以基本上，上面显示的骨架是一个真正的类，getBits() 签名显示了其余代码如何与其交互。

Extendet the example code into a microbenchmark, included blitzpasta's solution (fixed missing byte masking). On my old AMD box it turns out as ~11400ms vs ~38000ms. FYI: Its the divide and modulo operations that kill the performance. If you replace /8with >>3and %8with &7, both solutions are pretty close to each other (jdk1.7.0ea104).

将示例代码扩展到微基准测试中，包括 blitzpasta 的解决方案（修复了丢失的字节掩码）。在我的旧 AMD 机器上，结果显示为 ~11400 毫秒与 ~38000 毫秒。仅供参考：它的除法和模运算会破坏性能。如果将/8替换为>>3并将%8替换为&7，则两种解决方案彼此非常接近（jdk1.7.0ea104）。

There seemed to be a bit confusion about how and what to work on. The first, original post of the example code included a read() method to indicate where and when the byte buffer was filled. This got lost when the code was turned into the microbench. I re-introduced it to make this a little clearer. The idea is to beat all existing versions by adding another subclass of BitArray which need to implement getBits() and prepareBitGet(), the latter may be empty. Do not change the benchmarking to give your solution an advantage, the same could be done for all the existing solutions, making this a completely moot optimization! (really!!)

关于如何工作和做什么工作似乎有点混乱。示例代码的第一个原始帖子包含一个 read() 方法，用于指示字节缓冲区的填充位置和时间。当代码变成 microbench 时，这丢失了。我重新介绍了它以使这更清楚一点。这个想法是通过添加另一个需要实现 getBits() 和 prepareBitGet() 的 BitArray 的子类来击败所有现有版本，后者可能是空的。不要更改基准测试来为您的解决方案带来优势，所有现有解决方案都可以这样做，这使其成为完全没有实际意义的优化！（真的！！）

I added a Version0, which does nothing but increment the bitGet state. It always returns 0 to get a rough idea how big the benchmark overhead is. Its only there for comparison.

我添加了一个 Version0，它只会增加 bitGet 状态。它总是返回 0 以粗略地了解基准测试开销有多大。它仅用于比较。

Also, an adaption on MSN's idea was added (Version3). To keep things fair and comparable for all competitors, the byte array filling is now part of the benchmark, as well as a preparatory step (see above). Originally MSN's solution did not do so well, there was lots of overhead in preparing the int[] buffer. I took the liberty of optimizing the step a little, which turned it into a fierce competitor :) You might also find that I de-convoluted your code a little. Your getBit() could be condensed into a 3-liner, probably shaving off one or two percent. I deliberately did this to keep the code readable and because the other versions aren't as condensed as possible either (again for readability).

此外，还添加了对 MSN 想法的改编（版本 3）。为了对所有竞争对手保持公平和可比性，字节数组填充现在是基准测试的一部分，也是一个准备步骤（见上文）。原来MSN的解决方案做的不是很好，准备int[]缓冲区的开销很大。我冒昧地稍微优化了这一步，这使它成为一个激烈的竞争对手:) 您可能还会发现我对您的代码进行了一些解卷积。您的 getBit() 可以浓缩为 3 行，可能会减少百分之一或二。我故意这样做是为了保持代码的可读性，因为其他版本也没有尽可能精简（再次为了可读性）。

Conclusion(code example above update to include versions based on all applicable contributions). On my old AMD box (Sun JRE 1.6.0_21), they come out as:

结论（上面的代码示例更新以包含基于所有适用贡献的版本）。在我的旧 AMD 机器（Sun JRE 1.6.0_21）上，它们显示为：

V0 no implementaion took 5384ms
V1 Durandal's (original) took 10283ms
V2 blitzpasta's (adapted) took 12212ms
V3 MSN's (posted) took 11030ms
V4 MSN's (half-buffer modification) took 9700ms

V0 未实现耗时5384毫秒
V1 Durandal（原版）耗时10283毫秒
V2 blitzpasta（改编版）耗时12212毫秒
V3 MSN（已发布）耗时11030毫秒
V4 MSN（半缓冲区修改）耗时9700毫秒

Notes: In this benchmark an average of 7.5 bits is fetched per call to getBits(), and each bit is only read once. Since V3/V4 have to pay a high initialization cost, they tend to show better runtime behavior with more, shorter fetches (and consequently worse the closer to the maximum of 16 the average fetch size gets). Still, V4 stays slightly ahead of all others in allscenarios. In an actual application, the cache contention must be taken into account, since the extra space needed for V3/v4 may increase cache misses to a point where V0 would be a better choice. If the array is to be traversed more than once, V4 should be favored, since it fetches faster than every other and the costly initialization is amortized after the fist pass.

注意：在此基准测试中，每次调用 getBits() 平均获取 7.5 位，并且每个位仅读取一次。由于 V3/V4 必须支付高昂的初始化成本，因此它们往往会通过更多、更短的获取显示更好的运行时行为（因此，平均获取大小越接近最大值 16，则越差）。尽管如此，V4 在所有情况下都略领先于所有其他人。在实际应用中，必须考虑缓存争用，因为 V3/v4 所需的额外空间可能会将缓存未命中增加到 V0 是更好选择的程度。如果数组要被多次遍历，V4 应该受到青睐，因为它比其他任何一个都更快地获取，并且在第一次通过后成本高昂的初始化被分摊。

Answer 1

采纳答案by MSN

Well, depending on how far you want to go down the time vs. memory see-saw, you can allocate a side table of every 32-bits at every 16-bit offset and then do a mask and shift based on the 16-bit offset:

好吧，根据您想要在时间与内存跷跷板之间走多远，您可以在每 16 位偏移处分配每 32 位的边表，然后根据 16 位进行掩码和移位抵消：

byte[] bytes = new byte[2048];   
int bitGet;   
unsigned int dwords[] = new unsigned int[2046];

public BitArray() {   
    for (int i=0; i<bytes.length; ++i) {   
        bytes[i] = (byte) i;   
    }   

    for (int i= 0; i<dwords.length; ++i) {
        dwords[i]= 
            (bytes[i    ] << 24) | 
            (bytes[i + 1] << 16) | 
            (bytes[i + 2] <<  8) | 
            (bytes[i + 3]);
    }
}   

int getBits(int len)
{
    int offset= bitGet;
    int offset_index= offset>>4;
    int offset_offset= offset & 15;

    return (dwords[offset_index] >> offset_offset) & ((1 << len) - 1);
}

You avoid the branching (at the cost of quadrupling your memory footprint). And is looking up the mask really that much faster than (1 << len) - 1?

您避免了分支（以将内存占用增加四倍为代价）。并且查找掩码真的比 (1 << len) - 1 快得多吗？

Answer 2

回答by blizpasta

If you just want the unsigned bit sequence as an int.

如果您只想将无符号位序列作为整数。

static final int[] lookup = {0x0, 0x1, 0x3, 0x7, 0xF, 0x1F, 0x3F, 0x7F, 0xFF, 0x1FF, 0x3FF, 0x7FF, 0xFFF, 0x1FFF, 0x3FFF, 0x7FFF, 0xFFFF };

/*
 * bytes: byte array, with the bits indexed from 0 (MSB) to (bytes.length * 8 - 1) (LSB)
 * offset: index of the MSB of the bit sequence.
 * len: length of bit sequence, must from range [0,16].
 * Not checked for overflow
 */
static int getBitSeqAsInt(byte[] bytes, int offset, int len){

    int byteIndex = offset / 8;
    int bitIndex = offset % 8;
    int val;

    if ((bitIndex + len) > 16) {
        val = ((bytes[byteIndex] << 16 | bytes[byteIndex + 1] << 8 | bytes[byteIndex + 2]) >> (24 - bitIndex - len)) & lookup[len];
    } else if ((offset + len) > 8) {
        val = ((bytes[byteIndex] << 8 | bytes[byteIndex + 1]) >> (16 - bitIndex - len)) & lookup[len];
    } else {
        val = (bytes[byteIndex] >> (8 - offset - len)) & lookup[len];
    }

    return val;
}

If you want it as a String (modification of Margus' answer).

如果您希望将其作为字符串（修改 Margus 的答案）。

static String getBitSequence(byte[] bytes, int offset, int len){

    int byteIndex = offset / 8;
    int bitIndex = offset % 8;
    int count = 0;
    StringBuilder result = new StringBuilder();        

    outer:
    for(int i = byteIndex; i < bytes.length; ++i) {
        for(int j = (1 << (7 - bitIndex)); j > 0; j >>= 1) {
            if(count == len) {
                break outer;
            }                
            if((bytes[byteIndex] & j) == 0) {
                result.append('0');
            } else {
                result.append('1');
            }
            ++count;
        }
        bitIndex = 0;
    }
    return  result.toString();
}

Answer 3

回答by Margus

~~Just wondering why can't you use java.util.BitSet;~~

~~只是想知道为什么你不能使用 java.util.BitSet;~~

Basically what you can do, is to read the whole data as byte[], convert it to binary in stringformat and use string utilities like .substring()to do the work. This will also work bit sequences > 16.

基本上你可以做的是将整个数据读取为byte[]，将其转换为二进制string格式并使用字符串实用程序.substring()来完成这项工作。这也将起作用bit sequences > 16。

Lets say you have 3 bytes: 1, 2, 3and you want to extract bit sequence from 5th to 16th bit.

假设您有 3 个字节：1, 2, 3并且您想从第 5 位到第 16 位提取位序列。

Number Binary

数字二进制

1      00000001
2      00000010
3      00000011

Code example:

代码示例：

public static String getRealBinary(byte[] input){
    StringBuilder sb = new StringBuilder();

    for (byte c : input) {
        for (int n =  128; n > 0; n >>= 1){
            if ((c & n) == 0)
                sb.append('0');
            else sb.append('1');
        }
    }

    return sb.toString();
}
public static void main(String[] args) {
    byte bytes[] = new byte[]{1,2,3};
    String sbytes = getRealBinary(bytes);
    System.out.println(sbytes);
    System.out.println(sbytes.substring(5,16));
}

Output:

输出：

000000010000001000000011
00100000010

Speed:

速度：

I did a testrun for 1mtimes and on my computer it took 0.995s, so its reasonably very fast:

我进行了100 万次测试，在我的电脑上测试了0.995s，所以它相当快：

Code to repeat the test yourself:

自己重复测试的代码：

public static void main(String[] args) {
    Random r = new Random();
    byte bytes[] = new byte[4];
    long start, time, total=0;

    for (int i = 0; i < 1000000; i++) {
        r.nextBytes(bytes);
        start = System.currentTimeMillis();
        getRealBinary(bytes).substring(5,16);
        time = System.currentTimeMillis() - start;
        total+=time;
    }
    System.out.println("It took " +total + "ms");
}

Answer 4

回答by Ira Baxter

You want at most 16 bits, taken from an array of bytes. 16 bits can span at most 3 bytes. Here's a possible solution:

您最多需要 16 位，取自字节数组。16 位最多可以跨越 3 个字节。这是一个可能的解决方案：

    int GetBits(int bit_index, int bit_length) {
          int byte_offset = bit_index >> 3;
          return ((((((byte_array[byte_offset]<<8)
                    +byte_array[byte_offset+1])<<8)
                    +byte_array[byte_offset+2]))
                   >>(24-(bit_index&7)+bit_length))))
                  &((1<<bit_length)-1);
         }

[Untested]

[未经测试]

If you call this a lot you should precompute the 24-bit values for the 3 concatenated bytes, and store those into an int array.

如果你经常调用它，你应该预先计算 3 个连接字节的 24 位值，并将它们存储到一个 int 数组中。

I'll observe that if you are coding this in C on an x86, you don't even need to precompute the 24 bit array; simply access the by te array at the desire offset as a 32 bit value. The x86 will do unaligned fetches just fine. [commenter noted that endianess mucks this up, so it isn't an answer, OK, do the 24 bit version.]

我会观察到，如果你在 x86 上用 C 编码，你甚至不需要预先计算 24 位数组；只需在所需偏移量处访问 by te 数组作为 32 位值。x86 可以很好地执行未对齐的提取。[评论者指出字节序将其搞砸了，所以这不是一个答案，好吧，做 24 位版本。]

Answer 5

回答by Lii

Since Java 7 BitSethas the toLongArraymethod, which I believe will do exactly what the question asks for:

由于 Java 7BitSet具有该toLongArray方法，我相信该方法将完全满足问题的要求：

int subBits = (int) bitSet.get(lowBit, highBit).toLongArray()[0];

This has the advantage that it works with sequences larger than ints or longs. It has the performance disadvantage that a new BitSetobject must be allocated, and a new array object to hold the result.

这样做的优点是它适用于大于 int 或 long 的序列。它的性能缺点是BitSet必须分配一个新对象，以及一个新的数组对象来保存结果。

It would be really interesting to see how this compares with the other methods in the benchmark.

看看这与基准测试中的其他方法相比如何，真的很有趣。

java 有效地从 byte[] 数组中提取任意长度的位序列

提问by Durandal

采纳答案by MSN

回答by blizpasta

回答by Margus

回答by Ira Baxter

回答by Lii

相关推荐

最近更新

标签

java 有效地从 byte[] 数组中提取任意长度的位序列

提问by Durandal

采纳答案by MSN

回答by blizpasta

回答by Margus

回答by Ira Baxter

回答by Lii

相关推荐

java 如何将日语字符分类为汉字或假名？

java 获取 JDBC 连接时出错：无法在输入元感知对象时登记事务

java 如何在 JPA 列中使用自定义类型？

Java PreparedStatement UTF-8 字符问题

相关推荐

最近更新

标签