Java 中未知长度的字节数组：第二部分

Question

提问by Ian Durkan

Similar to "Byte array of unknown length in java"I need to be able to write an unknown number of bytes from a data source into a byte[] array. HoweverI need the ability to read from bytes that were stored earlier, for a compression algorithm, so ByteArrayOutputStreamdoesn't work for me.

类似于“java 中未知长度的字节数组”，我需要能够将未知数量的字节从数据源写入 byte[] 数组。但是，对于压缩算法，我需要能够从先前存储的字节中读取数据，因此ByteArrayOutputStream对我不起作用。

Right now I have a scheme where I allocate ByteBuffers of fixed size N, adding a new one as I reach N, 2N, 3N bytes etc. After the data is exhausted I dump all buffers into an array of now-known size.

现在我有一个方案，我分配固定大小 N 的 ByteBuffers，当我达到 N、2N、3N 字节等时添加一个新的。在数据用完后，我将所有缓冲区转储到现在已知大小的数组中。

Is there a better way to do this? Having fixed-size buffers reduces the flexibility of the compression algorithm.

有一个更好的方法吗？具有固定大小的缓冲区会降低压缩算法的灵活性。

Answer 1

采纳答案by vanza

Why don't you subclass ByteArrayOutputStream? That way your subclass has access to the protected bufand countfields, and you can add methods to your class to manipulate them directly.

你为什么不子类ByteArrayOutputStream？这样你的子类就可以访问受保护的buf和count字段，你可以向你的类添加方法来直接操作它们。

Answer 2

回答by Chris Dennett

What about using a circular byte buffer? It has the possibility to grow dynamically and is efficient.

使用循环字节缓冲区怎么样？它具有动态增长的可能性并且是高效的。

There's an implementation here: http://ostermiller.org/utils/CircularByteBuffer.java.html

这里有一个实现：http: //ostermiller.org/utils/CircularByteBuffer.java.html

Answer 3

回答by Voo

While you can certainly use an ArrayList for this, you pretty much look at an memory overhead of 4-8times - assuming that bytes aren't newly allocated but share one global instance (since this is true for integers I assume it works for Bytes as well) - and you lose all cache locality.

虽然您当然可以为此使用 ArrayList，但您几乎会看到 4-8 倍的内存开销 - 假设字节不是新分配的而是共享一个全局实例（因为这对于整数来说是正确的，我认为它适用于字节作为好吧）-您将丢失所有缓存位置。

So while you could subclass ByteArrayOutputStream, but even there you get overhead (the methods are synchronized) that you don't need. So I personally would just roll out my own class that grows dynamically when you write to it. Less efficient than your current method, but simple and we all know the part with the amortized costs - otherwise you can obviously use your solution as well. As long as you wrap the solution in a clean interface you'll hide the complexity and still get the good performance

因此，虽然您可以对 ByteArrayOutputStream 进行子类化，但即使在那里您也会获得不需要的开销（方法是同步的）。所以我个人只会推出我自己的类，当你写入它时会动态增长。比您当前的方法效率低，但很简单，我们都知道摊销成本的部分 - 否则您显然也可以使用您的解决方案。只要您将解决方案包装在一个干净的界面中，您就会隐藏复杂性并仍然获得良好的性能

Or otherwise said: No you pretty much can't do this more efficiently than what you're already doing and every built-in java Collection should perform worse for one reason or the other.

或者说：不，你几乎不能比你已经在做的更有效地做到这一点，并且每个内置的 java Collection 都应该出于某种原因表现得更糟。

Answer 4

回答by Sym-Sym

As Chris answered the CircularByteBuffer apiis the way to go. Luckily it is in central maven repo now. Quoting a snippet from this link, it is as simple as follows:

正如克里斯回答CircularByteBuffer api是要走的路。幸运的是，它现在在中央 maven 回购中。引用这个链接的一个片段，它很简单如下：

Single Threaded Example of a Circular Buffer

循环缓冲区的单线程示例

// buffer all data in a circular buffer of infinite size
CircularByteBuffer cbb = new CircularByteBuffer(CircularByteBuffer.INFINITE_SIZE);
class1.putDataOnOutputStream(cbb.getOutputStream());
class2.processDataFromInputStream(cbb.getInputStream());

Advantages are:

优点是：

One CircularBuffer class rather than two pipe classes.
It is easier to convert between the "buffer all data" and "extra threads" approaches.
You can change the buffer size rather than relying on the hard-coded 1k of buffer in the pipes.

一个 CircularBuffer 类而不是两个管道类。
在“缓冲所有数据”和“额外线程”方法之间进行转换更容易。
您可以更改缓冲区大小，而不是依赖管道中硬编码的 1k 缓冲区。

Finally we are free of memory concerns and pipes API

最后，我们摆脱了内存问题和管道 API

Answer 5

回答by Will Hartung

The expense of the ByteArrayOutputStream is the resizing of the underlying array. Your fixed block routine eliminates much of that. If the resizing isn't expensive enough to you to matter (i.e. in your testing the ByteArrayOutputStream is "fast enough", and doesn't provide undo memory pressure), then perhaps subclassing ByteArrayOutputStream, as suggested by vanza, would work for you.

ByteArrayOutputStream 的代价是调整底层数组的大小。您的固定块例程消除了其中的大部分内容。如果调整大小对您来说不够昂贵（即在您的测试中 ByteArrayOutputStream “足够快”，并且不提供撤消内存压力），那么也许按照 vanza 的建议对 ByteArrayOutputStream 进行子类化对您有用。

I don't know your compression algorithm, so I can't say why your list of blocks is making it less flexible, or even why the compression algorithm would even KNOW about the blocks. But since the blocks can by dynamic, you may be able to tune the block size as appropriate to better support the variety of the compression algorithm you're using.

我不知道你的压缩算法，所以我不能说为什么你的块列表使它不那么灵活，甚至为什么压缩算法甚至知道这些块。但是由于块可以是动态的，您可以适当地调整块大小以更好地支持您使用的各种压缩算法。

If the compression algorithm can work on a "stream" (i.e. fixed size chunks of data), then the block size should matter as you could hide all of those details from the implementation. The perfect world is if the compression algorithm wants its data in chunks that match the size of the blocks your allocating, that way you wouldn't have to copy data to feed the compressor.

如果压缩算法可以处理“流”（即固定大小的数据块），那么块大小应该很重要，因为您可以从实现中隐藏所有这些细节。完美的世界是，如果压缩算法希望它的数据以与您分配的块的大小相匹配的块的形式存在，那么您就不必复制数据来提供给压缩器。

Answer 6

回答by Calvin

For simplicity, you might consider using java.util.ArrayList:

为简单起见，您可以考虑使用java.util.ArrayList：

ArrayList<Byte> a = new ArrayList<Byte>();
a.add(value1);
a.add(value2);
...
byte value = a.get(0);

Java 1.5 and higher will provide automatic boxing and unboxing between the byteand Bytetypes. Performance may be slightlyworse than ByteArrayOutputStream, but it is easy to read and understand.

Java 1.5 及更高版本将提供byte和Byte类型之间的自动装箱和拆箱。性能可能比稍差ByteArrayOutputStream，但易于阅读和理解。

Java 中未知长度的字节数组：第二部分

提问by Ian Durkan

采纳答案by vanza

回答by Chris Dennett

回答by Voo

回答by Sym-Sym

Single Threaded Example of a Circular Buffer

循环缓冲区的单线程示例

Advantages are:

优点是：

回答by Will Hartung

回答by Calvin

相关推荐

最近更新

标签

Java 中未知长度的字节数组：第二部分

提问by Ian Durkan

采纳答案by vanza

回答by Chris Dennett

回答by Voo

回答by Sym-Sym

Single Threaded Example of a Circular Buffer

循环缓冲区的单线程示例

Advantages are:

优点是：

回答by Will Hartung

回答by Calvin

相关推荐

java 对于货币/金钱，我应该使用哪种 XML 数据类型？

java 计算不同单词的数量

java 在 JOptionPane 中将文本向右对齐

java 在方法上使用 @TransactionAttribute(value = TransactionAttributeType.NEVER)

相关推荐

最近更新

标签