java.util.zip - ZipInputStream 与 ZipFile
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4660819/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
java.util.zip - ZipInputStream v.s. ZipFile
提问by Lachezar Balev
I have some general questions regarding the java.util.zip
library.
What we basically do is an import and an export of many small components. Previously these components were imported and exported using a single big file, e.g.:
我有一些关于java.util.zip
图书馆的一般性问题。我们基本上做的是导入和导出许多小组件。以前这些组件是使用单个大文件导入和导出的,例如:
<component-type-a id="1"/>
<component-type-a id="2"/>
<component-type-a id="N"/>
<component-type-b id="1"/>
<component-type-b id="2"/>
<component-type-b id="N"/>
Please note that the order of the components during import is relevant.
请注意,导入期间组件的顺序是相关的。
Now every component should occupy its own file which should be externallyversioned, QA-ed, bla, bla. We decided that the output of our export should be a zip file (with all these files in) and the input of our import should be a similar zip file. We do not want to explode the zip in our system. We do not want opening separate streams for each of the small files. My current questions:
现在每个组件都应该占用自己的文件,这些文件应该进行外部版本控制、QA-ed、bla、bla。我们决定导出的输出应该是一个 zip 文件(包含所有这些文件),而我们的导入的输入应该是一个类似的 zip 文件。我们不想在我们的系统中爆炸 zip。我们不想为每个小文件打开单独的流。我目前的问题:
Q1. May the ZipInputStream
guarantee that the zip entries (the little files) will be read in the same order in which they were inserted by our export that uses ZipOutputStream
? I assume reading is something like:
一季度。是否可以ZipInputStream
保证 zip 条目(小文件)的读取顺序与我们使用 的导出插入它们的顺序相同ZipOutputStream
?我认为阅读是这样的:
ZipInputStream zis = new ZipInputStream(new BufferedInputStream(fis));
ZipEntry entry;
while((entry = zis.getNextEntry()) != null)
{
//read from zis until available
}
I know that the central zip directory is put at the end of the zip file but nevertheless the file entries inside have sequential order. I also know that relying on the order is an ugly idea but I just want to have all the facts in mind.
我知道中央 zip 目录放在 zip 文件的末尾,但是里面的文件条目有顺序。我也知道依赖订单是一个丑陋的想法,但我只想牢记所有事实。
Q2. If I use ZipFile
(which I prefer) what is the performance impact of calling getInputStream()
hundreds of times? Will it be much slower than the ZipInputStream
solution? The zip is opened only once and ZipFile
is backed by RandomAccessFile
- is this correct?
I assume reading is something like:
Q2。如果我使用ZipFile
(我更喜欢)调用getInputStream()
数百次对性能有什么影响?它会比ZipInputStream
解决方案慢得多吗?zip 仅打开一次ZipFile
并由其支持RandomAccessFile
- 这是正确的吗?我认为阅读是这样的:
ZipFile zipfile = new ZipFile(argv[0]);
Enumeration e = zipfile.entries();//TODO: assure the order of the entries
while(e.hasMoreElements()) {
entry = (ZipEntry) e.nextElement();
is = zipfile.getInputStream(entry));
}
Q3. Are the input streams retrieved from the same ZipFile
thread safe (e.g. may I read different entries in different threads simultaneously)? Any performance penalties?
Q3。从同一个ZipFile
线程中检索的输入流是否安全(例如,我可以同时读取不同线程中的不同条目)吗?有没有表现惩罚?
Thanks for your answers!
感谢您的回答!
采纳答案by StaxMan
Q1: yes, order will be the same in which entries were added.
Q1:是的,添加条目的顺序将相同。
Q2: note that due to structure of zip archive files, and compression, none of solutions is exactly streaming; they all do some level of buffering. And if you check out JDK sources, implementations share most code. There is no real random access to within content, although index does allow finding chunks that correspond to entries. So I think there should not be meaningful performance differences; especially as OS will do caching of disk blocks anyway. You may want to just test performance to verify this with a simple test case.
Q2:请注意,由于 zip 存档文件的结构和压缩,没有一个解决方案是完全流式传输的;他们都做了一定程度的缓冲。如果您查看 JDK 源代码,实现会共享大部分代码。尽管索引确实允许查找与条目相对应的块,但对内容内没有真正的随机访问。所以我认为应该没有有意义的性能差异;特别是因为操作系统无论如何都会缓存磁盘块。您可能只想测试性能以通过一个简单的测试用例来验证这一点。
Q3: I would not count on this; and most likely they aren't. If you really think concurrent access would help (mostly because decompression is CPU bound, so it might help), I'd try reading the whole file in memory, expose via ByteArrayInputStream, and construct multiple independent readers.
Q3:我不会指望这个;他们很可能不是。如果您真的认为并发访问会有所帮助(主要是因为解压受 CPU 限制,所以它可能会有所帮助),我会尝试读取内存中的整个文件,通过 ByteArrayInputStream 公开,并构建多个独立的读取器。
回答by Mark Jeronimus
I measured that just listing the files with ZipInputStream
is 8 times slower than with ZipFile
.
我测量到只列出文件ZipInputStream
比ZipFile
.慢 8 倍。
long t = System.nanoTime();
ZipFile zip = new ZipFile(jarFile);
Enumeration<? extends ZipEntry> entries = zip.entries();
while (entries.hasMoreElements())
{
ZipEntry entry = entries.nextElement();
String filename = entry.getName();
if (!filename.startsWith(JAR_TEXTURE_PATH))
continue;
textureFiles.add(filename);
}
zip.close();
System.out.println((System.nanoTime() - t) / 1e9);
and
和
long t = System.nanoTime();
ZipInputStream zip = new ZipInputStream(new FileInputStream(jarFile));
ZipEntry entry;
while ((entry = zip.getNextEntry()) != null)
{
String filename = entry.getName();
if (!filename.startsWith(JAR_TEXTURE_PATH))
continue;
textureFiles.add(filename);
}
zip.close();
System.out.println((System.nanoTime() - t) / 1e9);
(Don't run them in the same class. Make two different classes and run them separately)
(不要在同一个类中运行它们。制作两个不同的类并分别运行它们)
回答by Jesse Glick
Regarding Q3, experience in JENKINS-14362suggests that zlib is not thread-safe even when operating on unrelated streams, i.e. that it has some improperly shared static state. Not proven, just a warning.
关于 Q3,在JENKINS-14362 中的经验表明,即使在不相关的流上操作时,zlib 也不是线程安全的,即它有一些不正确共享的静态状态。未经证实,只是警告。