在 Java 中使用 Zip 和 GZip 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3711282/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 03:48:06  来源:igfitidea点击:

Working with Zip and GZip files in Java

javafile-iozipgzip

提问by Matt Ball

It's been a while since I've done Java I/O, and I'm not aware of the latest "right" ways to work with Zip and GZip files. I don't necessarily need a full working demo - I'm primarily looking for the right interfaces and methods to be using. Yes, I could look up any random tutorial on this, but performance is an issue (these files can get pretty big) and I do care about using the best tool for the job.

我已经有一段时间没有完成 Java I/O,我不知道处理 Zip 和 GZip 文件的最新“正确”方法。我不一定需要完整的工作演示 - 我主要是在寻找要使用的正确接口和方法。是的,我可以查找任何关于此的随机教程,但性能是一个问题(这些文件可能会变得非常大),我确实关心使用最好的工具来完成这项工作。

The basic process I'll be implementing:

我将实施的基本流程:

  • Download a bunch of files (that might be zipped, gzipped, or both) to a temp folder.
  • Add all the extracted files to a new zip file in a temp folder.
  • 将一堆文件(可能被压缩、gzip 或两者都压缩)下载到临时文件夹。
  • 将所有提取的文件添加到临时文件夹中的新 zip 文件中。

The input files might be compressed and archived more than once. For example, the "full extraction" should take any of the following inputs (I'm not in control of these), and leave behind foo.txt:

输入文件可能会被多次压缩和存档。例如,“完全提取”应该采用以下任何输入(我无法控制这些),并留下foo.txt

  • foo.txt.gz
  • foo.txt.zip
  • foo.txt.gz.zip
  • foo.txt.zip.gz
  • ...
  • foo.txt.gz.gz.gz.zip.gz.zip.zip.gz.gz
  • ...
  • foo.txt.gz
  • foo.txt.zip
  • foo.txt.gz.zip
  • foo.txt.zip.gz
  • ...
  • foo.txt.gz.gz.gz.zip.gz.zip.zip.gz.gz
  • ...

Then, I might be left with foo.txt, bar.mp3, baz.exe- so I would just add them all to a new zip file with some generic name.

然后,我可能会留下foo.txt, bar.mp3, baz.exe- 所以我只需将它们全部添加到具有一些通用名称的新 zip 文件中。

Questions:

问题:

  • With file size being a potential concern, which (interfaces/classes/methods) should I use to quickly:
    • extract zip files?
    • extract gzip files?
    • write zip files?
  • Am I better off keeping the individual extracted files in memory before writing back to the disk? Or,
  • Do potentially large files make that a bad idea?
  • 文件大小是一个潜在的问题,我应该快速使用哪个(接口/类/方法):
    • 提取zip文件?
    • 提取gzip文件?
    • 写zip文件?
  • 在写回磁盘之前,我最好将单独提取的文件保存在内存中吗?或者,
  • 潜在的大文件是不是一个坏主意?

采纳答案by Aaron Novstrup

Note that TrueZip, the library suggested below, has been superseded by TrueVFS.

请注意,下面建议的库 TrueZip 已被TrueVFS取代



I've found the TrueZIP libraryuseful. It allows you to treat archive files as if they're just another file system and use the familiar Java I/O APIs.

我发现TrueZIP 库很有用。它允许您将归档文件视为另一个文件系统并使用熟悉的 Java I/O API。

Unlike the java.util.zip API, TrueZIP provides random access to the contents of the archive, so file size should not be a concern. If I remember correctly, it will detect archive files and not try to redundantly compress them when you put them into an archive.

与 java.util.zip API 不同,TrueZIP 提供对存档内容的随机访问,因此文件大小不应成为问题。如果我没记错的话,它会检测存档文件,并且在您将它们放入存档时不会尝试对它们进行冗余压缩。

Quoting the TrueZIP page:

引用 TrueZIP 页面:

The TrueZIP API provides drop-in replacements for the well-known classes File , FileInputStream and FileOutputStream . This design makes TrueZIP very simple to use: All that is required to archive-enable most client applications is to add a few import statements for the package de.schlichtherle.io and add some type casts where required.

Now you can simply address archive files like directories in a path name. For example, the path name "archive.zip/readme" addresses the archive entry "readme" within the ZIP file "archive.zip". Note that file name suffixes are fully configurable and TrueZIP automatically detects false positives and reverts back to treat them like ordinary files or directories. This works recursively, so an archive file may even be enclosed in another archive file, like in "outer.zip/inner.zip/readme".

TrueZIP API 为著名的类 File 、 FileInputStream 和 FileOutputStream 提供了替代品。这种设计使 TrueZIP 使用起来非常简单:要对大多数客户端应用程序启用存档,所需要做的就是为 de.schlichtherle.io 包添加一些导入语句,并在需要时添加一些类型转换。

现在您可以简单地寻址归档文件,例如路径名中的目录。例如,路径名“archive.zip/readme”寻址 ZIP 文件“archive.zip”中的存档条目“readme”。请注意,文件名后缀是完全可配置的,TrueZIP 会自动检测误报并将其恢复为将它们视为普通文件或目录。这递归地工作,因此存档文件甚至可以包含在另一个存档文件中,例如“outer.zip/inner.zip/readme”。

回答by Powerlord

There may be a library somewhere to make this easy.

某处可能有一个图书馆可以使这变得容易。

However, if there isn't, you can still do it the hard way with the java.util.zip classes... using ZipFileor ZipInputStream, along with ZipEntryfor zip.

但是,如果没有,您仍然可以使用java.util.zip 类以艰难的方式完成它……使用 ZipFileorZipInputStream以及ZipEntryfor zip。

GZIPInputStreamcan wrap a FileInputStreamfor gzip, keeping in mind that gzip only works on single files.

GZIPInputStream可以FileInputStream为 gzip包装一个,请记住 gzip 仅适用于单个文件。

Both types of InputStreams also have their respective OutputStreams.

两种类型的 InputStreams 也有各自的 OutputStreams。

Unfortunately, although I know of these classes, I've never actually usedthem, so I can't advise you any more than that.

不幸的是,虽然我知道这些类,但我从来没有真正使用过它们,所以我不能给你更多的建议。

Edit: The Zip functions do not appear to have any method for adding new files to a zip file without recreating the entire thing.

编辑:Zip 函数似乎没有任何方法可以在不重新创建整个文件的情况下将新文件添加到 zip 文件中。

回答by dogbane

Don't hold all this uncompressed data in memory, or you might run out of heap space. You need to stream the data out to file when uncompressing and then stream it back in from file when you want to create your final zip file.

不要将所有这些未压缩的数据保存在内存中,否则可能会耗尽堆空间。您需要在解压缩时将数据流式传输到文件中,然后在您想要创建最终的 zip 文件时将其从文件中重新导入。

I haven't done zipped files before, but here is an example which shows how to uncompress a gzippedfile:

我以前没有做过压缩文件,但这里有一个例子展示了如何解压缩一个gzipped文件:

import java.io.*;
import java.util.zip.*;

//unzipping a gzipped file
GZIPInputStream in = null;
OutputStream out = null;
try {
   in = new GZIPInputStream(new FileInputStream("file.txt.gz"));
   out = new FileOutputStream("file.txt");
   byte[] buf = new byte[1024 * 4];
   int len;
   while ((len = in.read(buf)) > 0) {
       out.write(buf, 0, len);
   }
}
catch (IOException e) {
   e.printStackTrace();
}
finally {
   if (in != null)
       try {
           in.close();
       }
       catch (IOException ignore) {
       }
   if (out != null)
       try {
           out.close();
       }
       catch (IOException ignore) {
       }
}