java 如何从远程存档文件中提取单个文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3125841/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-30 00:29:03  来源:igfitidea点击:

How to extract a single file from a remote archive file?

javadownloadextracttararchive

提问by Oak

Given

给定的

  1. URL of an archive (e.g. a zip file)
  2. Full name (including path) of a file inside that archive
  1. 存档的 URL(例如 zip 文件)
  2. 该存档中文件的全名(包括路径)

I'm looking for a way (preferably in Java) to create a local copy of that file, without downloading the entire archive first.

我正在寻找一种方法(最好使用 Java)来创建该文件的本地副本,而无需先下载整个存档

From my (limited) understanding it should be possible, though I have no idea how to do that. I've been using TrueZip, since it seems to support a large variety of archive types, but I have doubts about its ability to work in such a way. Does anyone have any experience with that sort of thing?

从我的(有限的)理解来看,这应该是可能的,尽管我不知道该怎么做。我一直在使用TrueZip,因为它似乎支持多种存档类型,但我对它以这种方式工作的能力表示怀疑。有没有人有这种事情的经验?

EDIT:being able to also do that with tarballs and zipped tarballs is also important for me.

编辑:能够使用 tarball 和压缩的 tarball 也能做到这一点对我来说也很重要。

采纳答案by David Z

Well, at a minimum, you have to download the portion of the archive up to and including the compressed data of the file you want to extract. That suggests the following solution: open a URLConnectionto the archive, get its input stream, wrap it in a ZipInputStream, and repeatedly call getNextEntry()and closeEntry()to iterate through all the entries in the file until you reach the one you want. Then you can read its data using ZipInputStream.read(...).

嗯,至少,您必须下载存档的一部分,包括要提取的文件的压缩数据。这建议了以下解决方案:打开URLConnection存档文件,获取其输入流,将其包装在 a 中ZipInputStream,然后反复调用getNextEntry()closeEntry()遍历文件中的所有条目,直到找到您想要的条目。然后您可以使用ZipInputStream.read(...).

The Java code would look something like this:

Java 代码如下所示:

URL url = new URL("http://example.com/path/to/archive");
ZipInputStream zin = new ZipInputStream(url.getInputStream());
ZipEntry ze = zin.getNextEntry();
while (!ze.getName().equals(pathToFile)) {
    zin.closeEntry(); // not sure whether this is necessary
    ze = zin.getNextEntry();
}
byte[] bytes = new byte[ze.getSize()];
zin.read(bytes);

This is, of course, untested.

当然,这是未经测试的。

回答by Adam Crume

Contrary to the other answers here, I'd like to point out that ZIP entries are compressed individually, so (in theory) you don't need to download anything more than the directory and the entry itself. The server would need to support the RangeHTTP header for this to work.

与此处的其他答案相反,我想指出 ZIP 条目是单独压缩的,因此(理论上)除了目录和条目本身之外,您不需要下载任何其他内容。服务器需要支持RangeHTTP 标头才能使其工作。

The standard Java API only supports reading ZIP files from local files and input streams. As far as I know there's no provision for reading from random access remote files.

标准 Java API 仅支持从本地文件和输入流读取 ZIP 文件。据我所知,没有规定可以从随机访问远程文件中读取。

Since you're using TrueZip, I recommend implementing de.schlichtherle.io.rof.ReadOnlyFileusing Apache HTTP Client and creating a de.schlichtherle.util.zip.ZipFilewith that.

由于您使用的是 TrueZip,我建议您de.schlichtherle.io.rof.ReadOnlyFile使用 Apache HTTP 客户端来实现并使用它创建一个de.schlichtherle.util.zip.ZipFile

This won't provide any advantage for compressed TAR archives since the entire archive is compressed together (beyond just using an InputStream and killing it when you have your entry).

这不会为压缩的 TAR 存档提供任何优势,因为整个存档都被压缩在一起(除了仅使用 InputStream 并在您输入时将其杀死)。

回答by Christian Schlichtherle

Since TrueZIP 7.2, there is a new client API in the module TrueZIP Path. This is an implementation of an NIO.2 FileSystemProvider for JSE 7. Using this API, you can access HTTP URI as follows:

从 TrueZIP 7.2 开始,模块 TrueZIP Path 中有一个新的客户端 API。这是 JSE 7 的 NIO.2 FileSystemProvider 的实现。使用此 API,您可以访问 HTTP URI,如下所示:

Path path = new TPath(new URI("http://acme.com/download/everything.tar.gz/README.TXT"));
try (InputStream in = Files.newInputStream(path)) {
    // Read archive entry contents here.
    ...
}

回答by Michael

I'm not sure if there's a way to pull out a single file from a ZIP without downloading the whole thing first. But, if you're the one hosting the ZIP file, you could create a Java servlet which reads the ZIP file and returns the requested file in the response:

我不确定是否有办法从 ZIP 中提取单个文件而无需先下载整个文件。但是,如果您是 ZIP 文件的宿主,您可以创建一个 Java servlet,它读取 ZIP 文件并在响应中返回请求的文件:

public class GetFileFromZIPServlet extends HttpServlet{
  @Override
  public void doGet(HttpServletRequest request, HttpServletResponse response)
  throws ServletException, IOException{
    String pathToFile = request.getParameter("pathToFile");

    byte fileBytes[];
    //get the bytes of the file from the ZIP

    //set the appropriate content type, maybe based on the file extension
    response.setContentType("...");

    //write file to the response
    response.getOutputStream().write(fileBytes);
  }
}