Java boolean[] 与 BitSet:哪个更有效?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/605226/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 16:46:29  来源:igfitidea点击:

boolean[] vs. BitSet: Which is more efficient?

javaarraysperformancememorybitsets

提问by

What is more efficient in terms of memory and CPU usage — an array of booleans or a BitSet? Specific BitSet methods are not used, only get/set/clear (==, =, Arrays.fill respectively for an array).

就内存和 CPU 使用而言,哪个更有效boolean——s数组或 BitSet?不使用特定的 BitSet 方法,只使用 get/set/clear(==、=、Arrays.fill 分别用于数组)。

采纳答案by starblue

From some benchmarks with Sun JDK 1.6 computing primes with a sieve (best of 10 iterations to warm up, give the JIT compiler a chance, and exclude random scheduling delays, Core 2 Duo T5600 1.83GHz):

从 Sun JDK 1.6 用筛子计算素数的一些基准测试(最好的 10 次迭代预热,给 JIT 编译器一个机会,并排除随机调度延迟,Core 2 Duo T5600 1.83GHz):

BitSet is more memory efficient than boolean[] except for very small sizes. Each boolean in the array takes a byte. The numbers from runtime.freeMemory() are a bit muddled for BitSet, but less.

除了非常小的尺寸外,BitSet 比 boolean[] 的内存效率更高。数组中的每个布尔值都占用一个字节。来自 runtime.freeMemory() 的数字对于 BitSet 来说有点混乱,但更少。

boolean[] is more CPU efficient except for very large sizes, where they are about even. E.g., for size 1 million boolean[] is about four times faster (e.g. 6ms vs 27ms), for ten and a hundred million they are about even.

boolean[] 的 CPU 效率更高,除了非常大的尺寸,它们大约是均匀的。例如,对于大小为 100 万的 boolean[] 大约快四倍(例如 6ms 与 27ms),对于 10 和 1 亿,它们大约是偶数。

回答by Kevin Lacquement

I believe that a BitSet is more memory- and CPU-efficient, is it can internally pack the bits into int, longs, or native data types, whereas a boolean[] requires a byte for each bit of data. Additionally, if you were to use the other methods (and, or, etc), you would find that the BitSet is more efficient, as there is no need to iterate through every element of an array; bitwise math is used instead.

我相信 BitSet 的内存和 CPU 效率更高,它是否可以在内部将位打包成 int、long 或本机数据类型,而 boolean[] 需要一个字节来表示数据的每一位。此外,如果您要使用其他方法(and、or 等),您会发现 BitSet 更有效,因为无需遍历数组的每个元素;而是使用按位数学。

回答by TofuBeer

Going from Java to CPU is totally VM specific. For instance, it used to be that a boolean was actually implemented as a 32-bit value (quite probably is true to this day).

从 Java 到 CPU 完全是 VM 特定的。例如,过去布尔值实际上是作为 32 位值实现的(直到今天很可能也是如此)。

Unless you know it is going to matter you are better off writing the code to be clear, profile it, and then fix the parts that are slow or consuming a lot of memory.

除非您知道这很重要,否则最好编写清楚的代码,对其进行分析,然后修复缓慢或消耗大量内存的部分。

You can do this as you go. For instance I once decided to not call .intern() on Strings because when I ran the code in the profiler it slowed it down too much (despite using less memory).

你可以边走边做。例如,我曾经决定不对字符串调用 .intern() ,因为当我在分析器中运行代码时,它会减慢速度太多(尽管使用的内存较少)。

回答by kohlerm

It depends as always. Yes BitSet is more memory efficent, but as soon as you require multithreaded access boolean[] might be the better choice. For example for computing primes you only set the boolean to true and therefore you don't really need synchronization. Hans Boehmhas written some paper about this and the same technique can be used for marking nodes in graph.

这取决于往常。是的 BitSet 内存效率更高,但是一旦您需要多线程访问 boolean[] 可能是更好的选择。例如,对于计算素数,您只需将布尔值设置为 true,因此您实际上并不需要同步。Hans Boehm写了一些关于这个的论文,同样的技术可以用于标记图中的节点。

回答by Peter Lawrey

  • Boolean[]uses about 4-20 bytes per boolean value.
  • boolean[]uses about 1 byte per boolean value.
  • BitSetuses about 1 bit per boolean value.
  • Boolean[]每个布尔值使用大约 4-20 个字节。
  • boolean[]每个布尔值使用大约 1 个字节。
  • BitSet每个布尔值使用大约 1 位。

Memory size might not be an issue for you in which case boolean[] might be simpler to code.

内存大小对您来说可能不是问题,在这种情况下 boolean[] 编码可能更简单。

回答by Alex Reynolds

A bit left-field of your question, but if storage is a concern you may want to look into Huffman compression. For example, 00000001could be squeezed down by frequency to something equivalent to {(7)0, (1)1}. A more "randomized" string 00111010would require a more complex representation, e.g. {(2)0, (3)1, (1)0, (1)1, (1)0}, and take up more space. Depending on the structure of your bit data, you may get some storage benefit from its use, beyond BitSet.

您的问题有点偏左,但如果存储是一个问题,您可能需要研究Huffman compression。例如,00000001可以通过频率压缩到相当于{(7)0, (1)1}. 更“随机”的字符串00111010需要更复杂的表示,例如{(2)0, (3)1, (1)0, (1)1, (1)0},并占用更多空间。根据位数据的结构,除了BitSet.

回答by Jason C

As for memory, the documentation for a BitSethas pretty clear implications. In particular:

至于内存, a 的文档BitSet有非常明确的含义。特别是:

Every bit set has a current size, which is the number of bits of space currently in use by the bit set. Note that the size is related to the implementation of a bit set, so it may change with implementation. The length of a bit set relates to logical length of a bit set and is defined independently of implementation.

每个位集都有一个当前大小,即位集当前使用的空间位数。请注意,大小与位集的实现有关,因此它可能会随着实现而改变。位集的长度与位集的逻辑长度相关,并且独立于实现而定义。

The source for Java library classes is openly available and one can easily check this for themselves. In particular:

Java 库类的源代码是公开可用的,可以很容易地自己检查。特别是:

The internal field corresponding to the serialField "bits".
89 
90     private long[] words;

As for speed; it depends on what one is doing. In general, don't think about speed ahead of time; use whichever tool makes the most sense semantically and leads to the clearest code. Optimize only after observing that performance requirements aren't met and identifying bottlenecks.

至于速度;这取决于一个人在做什么。一般来说,不要提前考虑速度;使用在语义上最有意义的工具,并产生最清晰的代码。只有在观察到未满足性能要求并确定瓶颈后才进行优化。

Coming to SO and asking if A is faster than B is silly for many reasons, including but certainly not limited to:

来到 SO 并询问 A 是否比 B 快是愚蠢的,原因有很多,包括但当然不限于:

  1. It depends on the application, which nobody responding generally has access to. Analyze and profile it in the context it is being used in. Be sure that it's a bottleneck that's actually worth optimizing.
  2. Questions like this that ask about speed generally show that the OP thinks they care about efficiency but wasn't willing to profile and didn't define performance requirements. Under the surface, that's usually a red flag that the OP is headed down the wrong path.
  1. 这取决于应用程序,通常没有人响应可以访问。在使用它的上下文中对其进行分析和剖析。确保它是一个真正值得优化的瓶颈。
  2. 像这样询问速度的问题通常表明 OP 认为他们关心效率但不愿意分析并且没有定义性能要求。在表面之下,这通常是一个危险信号,表明 OP 正朝着错误的道路前进。

I know this is an old question but it came up recently; and I believe this is worth adding.

我知道这是一个老问题,但最近出现了;我相信这是值得添加的。

回答by tagore84

Here you can see a Memory/Time benchmark comparing a boolean[][] trianguar matrix versus BitSet[] triangular matrix.

在这里,您可以看到比较 boolean[][] 三角矩阵与 BitSet[] 三角矩阵的内存/时间基准。

I create, set and read the (size * (size-1) / 2) values and compare memory usage and time...

我创建、设置和读取 (size * (size-1) / 2) 值并比较内存使用情况和时间...

enter image description here

在此处输入图片说明

enter image description here

在此处输入图片说明

Hope this help...

希望这有助于...

Here the code... (just a quikly dirty test code, sorry ;)

这里的代码......(只是一个非常肮脏的测试代码,抱歉;)

import java.util.BitSet;
import java.util.Date;

public class BooleanBitSetProfiler {

    Runtime runtime;
    int sum = 0;
    public void doIt() {

        runtime = Runtime.getRuntime();
        long[][] bitsetMatrix = new long[30][2];
        long[][] booleanMatrix = new long[30][2];
        int size = 1000;
        for (int i = 0; i < booleanMatrix.length; i++) {
            booleanMatrix[i] = testBooleanMatrix(size);
            bitsetMatrix[i] = testBitSet(size);
            size += 2000;
        }
        int debug = 1;
        for (int j = 0; j < booleanMatrix.length; j++){
            System.out.print(booleanMatrix[j][0] + ";");
        }
        System.out.println();
        for (int j = 0; j < booleanMatrix.length; j++){
            System.out.print(booleanMatrix[j][1] + ";");
        }
        System.out.println();
        for (int j = 0; j < bitsetMatrix.length; j++){
            System.out.print(bitsetMatrix[j][0] + ";");
        }
        System.out.println();
        for (int j = 0; j < bitsetMatrix.length; j++){
            System.out.print(bitsetMatrix[j][1] + ";");
        }
        System.out.println();
    }

    private long memory () {
        return runtime.totalMemory() - runtime.freeMemory();
    }
    private long[] testBooleanMatrix(int size) {
        runtime.gc();
        long startTime = new Date().getTime();
        long startMemory = memory();
        boolean[][] matrix = new boolean[size][];
        for (int i = 0; i < size; i++) {
            matrix[i] = new boolean[size - i - 1];
        }
        long creationMemory = memory();
        long creationTime = new Date().getTime();
        for (int i = 0; i < size; i++)  {
            for (int j = 0; j < matrix[i].length; j++) {
                matrix[i][j] = i % 2 == 0;
            }
        }
        long setMemory = memory();
        long setTime = new Date().getTime();
        for (int i = 0; i < size; i++)  {
            for (int j = 0; j < matrix[i].length; j++) {
                if (matrix[i][j]) sum++;
            }
        }
        long readTime = new Date().getTime();
        System.out.println("Boolean[][] (size " + size + ")");
        System.out.println("Creation memory " + printMem(creationMemory-startMemory) + ", set memory " + printMem(setMemory-startMemory));
        System.out.println("Creation time " + printTime(creationTime-startTime) + ", set time " + printTime(setTime - creationTime) + " read time " + printTime(readTime - setTime) + "\n");
        runtime.gc();
        return new long[]{(setMemory-startMemory)/(1024*1024), (readTime-startTime)};
    }
    private long[] testBitSet(int size) {
        runtime.gc();
        long startTime = new Date().getTime();
        long startMemory = memory();
        BitSet[] matrix = new BitSet[size];
        for (int i = 0; i < size; i++) {
            matrix[i] = new BitSet(size - i - 1);
        }
        long creationMemory = memory();
        long creationTime = new Date().getTime();
        for (int i = 0; i < size; i++)  {
            for (int j = 0; j < matrix[i].size(); j++) {
                matrix[i].set(j, (i % 2 == 0));
            }
        }
        long setMemory = memory();
        long setTime = new Date().getTime();
        for (int i = 0; i < size; i++)  {
            for (int j = 0; j < matrix[i].size(); j++) {
                if (matrix[i].get(j)) sum++;
            }
        }
        long readTime = new Date().getTime();
        System.out.println("BitSet[] (size " + size + ")");
        System.out.println("Creation memory " + printMem(creationMemory-startMemory) + ", set memory " + printMem(setMemory-startMemory));
        System.out.println("Creation time " + printTime(creationTime-startTime) + ", set time " + printTime(setTime - creationTime) + " read time " + printTime(readTime - setTime) + "\n");
        runtime.gc();
        return new long[]{(setMemory-startMemory)/(1024*1024), (readTime-startTime)};
    }

    private String printMem(long mem) {
        mem = mem / (1024*1024);
        return mem + "MB";
    }
    private String printTime(long milis) {
        int seconds = (int) (milis / 1000);
        milis = milis % 1000;
        return seconds > 0 ? seconds + "s " + milis + "ms" : milis + "ms";
    }
}