java 在 WebApp 中创建和下载大型 ZIP（来自多个 BLOB）的最佳实践

Question

提问by Andrea Ligios

I will need to perform a massive download of files from my Web Application.

我需要从我的 Web 应用程序执行大量文件下载。

It is obviously expected to be a long-running action (it'll be used once-per-year[-per-customer]), so the time is not a problem (unless it hits some timeout, but I can handle that by creating some form of keepalive heartbeating). I know how to create an hidden iframeand use it with content-disposition: attachmentto attempt to download the file instead of opening it inside the browser, and how to instance a client-server communication for drawing a progress meter;

显然，这将是一个长期运行的操作（它将每年使用一次 [-per-customer]），所以时间不是问题（除非它超时，但我可以通过创建某种形式的保活心跳）。我知道如何创建一个隐藏的iframe并使用它content-disposition: attachment来尝试下载文件而不是在浏览器中打开它，以及如何实例化客户端-服务器通信以绘制进度表；

The actual size of the download (and the number of files) is unknown, but for simplicity we can virtually consider it as 1GB, composed of 100 files, each 10MB.

下载的实际大小（和文件数量）未知，但为简单起见，我们实际上可以将其视为 1GB，由 100 个文件组成，每个文件 10MB。

Since this should be a one-click operation, my first thought was to group all the files, while reading them from the database, in a dynamically generated ZIP, then ask the user to save the ZIP.

由于这应该是一个单击操作，我的第一个想法是将所有文件分组，同时从数据库中读取它们，在动态生成的 ZIP 中，然后要求用户保存 ZIP。

The question is: what are the best practices, and what are the known drawbacks and traps, in creating a huge archive from multiple small byte arrays in a WebApp?

问题是：从 WebApp 中的多个小字节数组创建一个巨大的档案时，最佳实践是什么，已知的缺点和陷阱是什么？

That can be randomly split into:

可以随机分为：

should each byte array be converted in a physical temp file, or can they be added to the ZIP in memory ?
if yes, I know I'll have to handle the possible equality of names (they can have the same name in different records in the database, but not inside the same file system nor ZIP): are there any other possible problems that come to mind (assuming the file system always has enough physical space) ?
since I can't rely on having enough RAM to perform the whole operation in memory, I guess the ZIP should be created and fed to the file system before being sent to the user; is there any way to do it differently (eg with websocket), like asking the user where to save the file, and then starting a constant flow of data from the server to client (Sci-FiI guess) ?
any other related known problems or best practices that cross your mind would be greatly appreciated.

每个字节数组应该在物理临时文件中转换，还是可以添加到内存中的 ZIP 中？
如果是，我知道我将不得不处理可能的名称相等（它们可以在数据库的不同记录中具有相同的名称，但不能在同一个文件系统或 ZIP 中）：还有其他可能的问题吗？介意（假设文件系统总是有足够的物理空间）？
由于我不能依靠有足够的 RAM 来在内存中执行整个操作，我猜应该在发送给用户之前创建 ZIP 并将其提供给文件系统；有什么方法可以做不同的事情（例如使用websocket），比如询问用户在哪里保存文件，然后开始从服务器到客户端的恒定数据流（我猜是科幻小说）？
如果您想到任何其他相关的已知问题或最佳实践，我们将不胜感激。

Answer 1

采纳答案by prunge

For large content that won't fit in memory at once, streamthe content from the database to the response.

对于一次无法放入内存的大型内容，请将内容从数据库流式传输到响应。

This kind of thing is actually pretty simple. You don't need AJAX or websockets, it's possible to stream large file downloads through a simple link that the user clicks on. And modern browsers have decent download managers with their own progress bars - why reinvent the wheel?

这种事情其实很简单。您不需要 AJAX 或 websockets，可以通过用户单击的简单链接流式传输大文件下载。现代浏览器有不错的下载管理器，有自己的进度条——为什么要重新发明轮子？

If writing a servlet from scratch for this, get access to the database BLOB, getting its input stream and copy content through to the HTTP response output stream. If you have Apache Commons IO library, you can use IOUtils.copy(), otherwise you can do this yourself.

如果为此从头开始编写 servlet，请访问数据库 BLOB，获取其输入流并将内容复制到 HTTP 响应输出流。如果你有 Apache Commons IO 库，你可以使用IOUtils.copy()，否则你可以自己做。

Creating a ZIP file on the fly can be done with a ZipOutputStream. Create one of these over the response output stream (from the servlet or whatever your framework gives you), then get each BLOB from the database, using putNextEntry()first and then streaming each BLOB as described before.

可以使用ZipOutputStream 动态创建 ZIP 文件。在响应输出流（来自 servlet 或您的框架提供的任何内容）上创建其中一个，然后从数据库中获取每个 BLOB，putNextEntry()首先使用然后流式传输每个 BLOB，如前所述。

Potential Pitfalls/Issues:

潜在的陷阱/问题：

Depending on the download size and network speed, the request might take a lot of time to complete. Firewalls, etc. can get in the way of this and terminate the request early.
Hopefully your users are on a decent corporate network when requesting these files. It would be far worse over remote/dodgey/mobile connections (if it drops out after downloading 1.9G of 2.0G, users have to start again).
It can put a bit of load on your server, especially compressing huge ZIP files. It might be worth turning compression down/off when creating the ZipOutputStreamif this is a problem.
ZIP files over 2GB (or is that 4 GB) might have issues with some ZIP programs. I think the latest Java 7 uses ZIP64 extensions, so this version of Java will write the huge ZIP correctly but will the clients have programs that support the large zip files? I've definitely run into issues with these before, especially on old Solaris servers

根据下载大小和网络速度，请求可能需要很长时间才能完成。防火墙等可能会阻止并提前终止请求。
希望您的用户在请求这些文件时使用的是体面的公司网络。在远程/狡猾/移动连接上会更糟（如果在下载 1.9G 或 2.0G 后掉线，用户必须重新开始）。
它会给您的服务器带来一些负担，尤其是压缩巨大的 ZIP 文件。ZipOutputStream如果这是一个问题，可能值得在创建时关闭/关闭压缩。
超过 2GB（或 4GB）的 ZIP 文件可能与某些 ZIP 程序有关。我认为最新的 Java 7 使用 ZIP64 扩展，所以这个版本的 Java 可以正确地编写巨大的 ZIP，但是客户端是否有支持大 zip 文件的程序？我以前肯定遇到过这些问题，尤其是在旧的 Solaris 服务器上

Answer 2

回答by Andrea Ligios

Kick-off example of a totally dynamic ZIP filecreated by streaming each BLOB from the database directly to the client's File System.

通过将每个 BLOB 从数据库直接流式传输到客户端的文件系统而创建的完全动态 ZIP 文件的启动示例。

Tested with huge archives with the following performances:

使用具有以下性能的巨大档案进行测试：

Server disk spacecost: 0 MegaBytes
Server RAMcost: ~~~ xx Megabytes.~~the memory consumption is not testable (or at least I don't know how to do it properly), because I got different, apparently random results from running the same routine multiple times (by using Runtime.getRuntime().freeMemory()) before, during and after the loop). However, the memory consumption is lower than using byte[], and that's enough.

服务器磁盘空间成本：0 兆字节
服务器RAM成本： ~~~ xx 兆字节。~~内存消耗是不可测试的（或者至少我不知道如何正确执行），因为Runtime.getRuntime().freeMemory()在循环之前、期间和之后多次运行相同的例程（通过使用），我得到了不同的、显然是随机的结果）。但是，内存消耗比使用byte[]要低，这就足够了。

FileStreamDto.javausing InputStreaminstead of byte[]

FileStreamDto.java使用InputStream代替byte[]

public class FileStreamDto implements Serializable {
    @Getter @Setter private String filename;
    @Getter @Setter private InputStream inputStream; 
}

Java Servlet(or Struts2 Action)

Java Servlet（或 Struts2 操作）

/* Read the amount of data to be streamed from Database to File System,
   summing the size of all Oracle's BLOB, PostgreSQL's ABYTE etc: 
   SELECT sum(length(my_blob_field)) FROM my_table WHERE my_conditions
*/          
Long overallSize = getMyService().precalculateZipSize();

// Tell the browser is a ZIP
response.setContentType("application/zip"); 
// Tell the browser the filename, and that it needs to be downloaded instead of opened
response.addHeader("Content-Disposition", "attachment; filename=\"myArchive.zip\"");        
// Tell the browser the overall size, so it can show a realistic progressbar
response.setHeader("Content-Length", String.valueOf(overallSize));      

ServletOutputStream sos = response.getOutputStream();       
ZipOutputStream zos = new ZipOutputStream(sos);

// Set-up a list of filenames to prevent duplicate entries
HashSet<String> entries = new HashSet<String>();

/* Read all the ID from the interested records in the database, 
   to query them later for the streams: 
   SELECT my_id FROM my_table WHERE my_conditions */           
List<Long> allId = getMyService().loadAllId();

for (Long currentId : allId){
    /* Load the record relative to the current ID:         
       SELECT my_filename, my_blob_field FROM my_table WHERE my_id = :currentId            
       Use resultset.getBinaryStream("my_blob_field") while mapping the BLOB column */
    FileStreamDto fileStream = getMyService().loadFileStream(currentId);

    // Create a zipEntry with a non-duplicate filename, and add it to the ZipOutputStream
    ZipEntry zipEntry = new ZipEntry(getUniqueFileName(entries,fileStream.getFilename()));
    zos.putNextEntry(zipEntry);

    // Use Apache Commons to transfer the InputStream from the DB to the OutputStream
    // on the File System; at this moment, your file is ALREADY being downloaded and growing
    IOUtils.copy(fileStream.getInputStream(), zos);

    zos.flush();
    zos.closeEntry();

    fileStream.getInputStream().close();                    
}

zos.close();
sos.close();

Helper methodfor handling duplicate entries

处理重复条目的辅助方法

private String getUniqueFileName(HashSet<String> entries, String completeFileName){                         
    if (entries.contains(completeFileName)){                                                
        int extPos = completeFileName.lastIndexOf('.');
        String extension = extPos>0 ? completeFileName.substring(extPos) : "";          
        String partialFileName = extension.length()==0 ? completeFileName : completeFileName.substring(0,extPos);
        int x=1;
        while (entries.contains(completeFileName = partialFileName + "(" + x + ")" + extension))
            x++;
    } 
    entries.add(completeFileName);
    return completeFileName;
}

Thanks a lot @prungefor giving me the idea of the direct streaming.

非常感谢@prunge给我直接流式传输的想法。

Answer 3

回答by Indu Devanath

May be you want to try multiple downloads concurrently. I found a discussion related to this here - Java multithreaded file downloading performance

可能您想同时尝试多个下载。我在这里找到了与此相关的讨论 - Java 多线程文件下载性能

Hope this helps.

希望这可以帮助。

java 在 WebApp 中创建和下载大型 ZIP（来自多个 BLOB）的最佳实践

提问by Andrea Ligios

采纳答案by prunge

回答by Andrea Ligios

回答by Indu Devanath

相关推荐

最近更新

标签

java 在 WebApp 中创建和下载大型 ZIP（来自多个 BLOB）的最佳实践

提问by Andrea Ligios

采纳答案by prunge

回答by Andrea Ligios

回答by Indu Devanath

相关推荐

java 为什么 SQLException 是已检查的异常

java 从 ArrayList<HashMap<String, String>> 取值

java Hibernate 和 Spring - Dao,Services

Java 主要和次要垃圾收集

相关推荐

最近更新

标签