Java 如何将一个巨大的 zip 文件拆分成多个卷?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/243992/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to split a huge zip file into multiple volumes?
提问by Thollsten
When I create a zip Archive via java.util.zip.*
, is there a way to split the resulting archive in multiple volumes?
当我通过创建 zip 存档时java.util.zip.*
,有没有办法将生成的存档拆分为多个卷?
Let's say my overall archive has a filesize
of 24 MB
and I want to split it into 3 files on a limit of 10 MB per file.
Is there a zip API which has this feature? Or any other nice ways to achieve this?
假设我的整个存档有一个filesize
,24 MB
我想将它分成 3 个文件,每个文件的限制为 10 MB。
是否有具有此功能的 zip API?或者任何其他好的方法来实现这一目标?
Thanks Thollsten
谢谢托尔斯滕
采纳答案by sakana
Check: http://saloon.javaranch.com/cgi-bin/ubb/ultimatebb.cgi?ubb=get_topic&f=38&t=004618
检查:http: //saloon.javaranch.com/cgi-bin/ubb/ultimatebb.cgi?ubb=get_topic&f=38&t=004618
I am not aware of any public API that will help you do that. (Although if you do not want to do it programatically, there are utilities like WinSplitter that will do it)
我不知道有任何公共 API 可以帮助您做到这一点。(尽管如果您不想以编程方式执行此操作,可以使用 WinSplitter 等实用程序来执行此操作)
I have not tried it but, every ZipEntry while using ZippedInput/OutputStream has a compressed size. You may get a rough estimate of the size of the zipped file while creating it. If you need 2MB of zipped files, then you can stop writing to a file after the cumulative size of entries become 1.9MB, taking .1MB for Manifest file and other zip file specific elements.So, in a nutshell, you can write a wrapper over the ZippedInputStream as follows:
我还没有尝试过,但是使用 ZippedInput/OutputStream 时的每个 ZipEntry 都有一个压缩大小。创建压缩文件时,您可能会粗略估计压缩文件的大小。如果您需要 2MB 的压缩文件,那么您可以在条目的累积大小变为 1.9MB 后停止写入文件,为清单文件和其他 zip 文件特定元素占用 0.1MB。因此,简而言之,您可以在 ZippedInputStream 上编写一个包装器,如下所示:
import java.util.zip.ZipOutputStream;
import java.util.zip.ZipEntry;
import java.io.FileOutputStream;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
public class ChunkedZippedOutputStream {
private ZipOutputStream zipOutputStream;
private String path;
private String name;
private long currentSize;
private int currentChunkIndex;
private final long MAX_FILE_SIZE = 16000000; // Whatever size you want
private final String PART_POSTFIX = ".part.";
private final String FILE_EXTENSION = ".zip";
public ChunkedZippedOutputStream(String path, String name) throws FileNotFoundException {
this.path = path;
this.name = name;
constructNewStream();
}
public void addEntry(ZipEntry entry) throws IOException {
long entrySize = entry.getCompressedSize();
if((currentSize + entrySize) > MAX_FILE_SIZE) {
closeStream();
constructNewStream();
} else {
currentSize += entrySize;
zipOutputStream.putNextEntry(entry);
}
}
private void closeStream() throws IOException {
zipOutputStream.close();
}
private void constructNewStream() throws FileNotFoundException {
zipOutputStream = new ZipOutputStream(new FileOutputStream(new File(path, constructCurrentPartName())));
currentChunkIndex++;
currentSize = 0;
}
private String constructCurrentPartName() {
// This will give names is the form of <file_name>.part.0.zip, <file_name>.part.1.zip, etc.
StringBuilder partNameBuilder = new StringBuilder(name);
partNameBuilder.append(PART_POSTFIX);
partNameBuilder.append(currentChunkIndex);
partNameBuilder.append(FILE_EXTENSION);
return partNameBuilder.toString();
}
}
The above program is just a hint of the approach and not a final solution by any means.
上面的程序只是方法的一个提示,无论如何都不是最终的解决方案。
回答by Kevin Day
If the goal is to have the output be compatible with pkzip and winzip, I'm not aware of any open source libraries that do this. We had a similar requirement for one of our apps, and I wound up writing our own implementation (compatible with the zip standard). If I recall, the hardest thing for us was that we had to generate the individual files on the fly (the way that most zip utilities work is they create the big zip file, then go back and split it later - that's a lot easier to implement. Took about a day to write and 2 days to debug.
如果目标是让输出与 pkzip 和 winzip 兼容,我不知道有任何开源库可以做到这一点。我们对我们的一个应用程序有类似的要求,我最终编写了自己的实现(与 zip 标准兼容)。如果我记得,对我们来说最困难的事情是我们必须动态生成单个文件(大多数 zip 实用程序的工作方式是创建大 zip 文件,然后再返回并拆分它 - 这要容易得多实现。花了大约一天的时间编写代码和 2 天的调试时间。
The zip standard explains what the file format has to look like. If you aren't afraid of rolling up your sleeves a bit, this is definitely doable. You do have to implement a zip file generator yourself, but you can use Java's Deflator class to generate the segment streams for the compressed data. You'll have to generate the file and section headers yourself, but they are just bytes - nothing too hard once you dive in.
zip 标准解释了文件格式的外观。如果你不怕卷起袖子,这绝对是可行的。您必须自己实现一个 zip 文件生成器,但您可以使用 Java 的 Deflator 类为压缩数据生成段流。您必须自己生成文件和部分标题,但它们只是字节 - 一旦您深入了解就不会太难。
Here's the zip specification- section K has the info you are looking for specifically, but you'll need to read A, B, C and F as well. If you are dealing with really big files (We were), you'll have to get into the Zip64 stuff as well - but for 24 MB, you are fine.
这是zip 规范- 部分 K 包含您正在寻找的信息,但您还需要阅读 A、B、C 和 F。如果您正在处理非常大的文件(我们曾经是),您还必须使用 Zip64 的内容 - 但对于 24 MB,您没问题。
If you want to dive in and try it - if you run into questions, post back and I'll see if I can provide some pointers.
如果您想深入并尝试一下 - 如果您遇到问题,请回帖,我会看看是否可以提供一些指导。
回答by tpky
Below code is my solution to split zip file in directory structure to chunks based on desired size. I found the previous answers useful so, wanted to contribute with similar but little more neat approach. This code is working for me for my specific needs, and I believe there is room for improvement.
下面的代码是我根据所需大小将目录结构中的 zip 文件拆分为块的解决方案。我发现以前的答案很有用,所以想用类似但更简洁的方法做出贡献。这段代码适用于我的特定需求,我相信还有改进的余地。
private final static long MAX_FILE_SIZE = 1000 * 1000 * 1024; // around 1GB
private final static String zipCopyDest = "C:\zip2split\copy";
public static void splitZip(String zipFileName, String zippedPath, String coreId) throws IOException{
System.out.println("process whole zip file..");
FileInputStream fis = new FileInputStream(zippedPath);
ZipInputStream zipInputStream = new ZipInputStream(fis);
ZipEntry entry = null;
int currentChunkIndex = 0;
//using just to get the uncompressed size of the zipentries
long entrySize = 0;
ZipFile zipFile = new ZipFile(zippedPath);
Enumeration enumeration = zipFile.entries();
String copDest = zipCopyDest + "\" + coreId + "_" + currentChunkIndex +".zip";
FileOutputStream fos = new FileOutputStream(new File(copDest));
BufferedOutputStream bos = new BufferedOutputStream(fos);
ZipOutputStream zos = new ZipOutputStream(bos);
long currentSize = 0;
try {
while ((entry = zipInputStream.getNextEntry()) != null && enumeration.hasMoreElements()) {
ZipEntry zipEntry = (ZipEntry) enumeration.nextElement();
System.out.println(zipEntry.getName());
System.out.println(zipEntry.getSize());
entrySize = zipEntry.getSize();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
//long entrySize = entry.getCompressedSize();
//entrySize = entry.getSize(); //gives -1
if((currentSize + entrySize) > MAX_FILE_SIZE) {
zos.close();
//construct a new stream
//zos = new ZipOutputStream(new FileOutputStream(new File(zippedPath, constructCurrentPartName(coreId))));
currentChunkIndex++;
zos = getOutputStream(currentChunkIndex, coreId);
currentSize = 0;
}else{
currentSize += entrySize;
zos.putNextEntry(new ZipEntry(entry.getName()));
byte[] buffer = new byte[8192];
int length = 0;
while ((length = zipInputStream.read(buffer)) > 0) {
outputStream.write(buffer, 0, length);
}
byte[] unzippedFile = outputStream.toByteArray();
zos.write(unzippedFile);
unzippedFile = null;
outputStream.close();
zos.closeEntry();
}
//zos.close();
}
} finally {
zos.close();
}
}
public static ZipOutputStream getOutputStream(int i, String coreId) throws IOException {
System.out.println("inside of getOutputStream()..");
ZipOutputStream out = new ZipOutputStream(new FileOutputStream(zipCopyDest + "\" + coreId + "_" + i +".zip"));
// out.setLevel(Deflater.DEFAULT_COMPRESSION);
return out;
}
public static void main(String args[]) throws IOException{
String zipFileName = "Large_files _for_testing.zip";
String zippedPath= "C:\zip2split\Large_files _for_testing.zip";
String coreId = "Large_files _for_testing";
splitZip(zipFileName, zippedPath, coreId);
}
回答by Drakes
For what it's worth, I like to use try-with-resourceseverywhere. If you are into that design pattern, then you will like this. Also, this solves the problem of empty parts if the entries are larger than the desired part size. You will at leasthave as many parts as entries in the worst case.
对于它的价值,我喜欢在任何地方使用try-with-resources。如果你喜欢这种设计模式,那么你会喜欢这个。此外,如果条目大于所需的零件尺寸,这也解决了空零件的问题。在最坏的情况下,您将至少拥有与条目一样多的部分。
In:
在:
my-archive.zip
我的存档.zip
Out:
出去:
my-archive.part1of3.zip
my-archive.part2of3.zip
my-archive.part3of3.zip
my-archive.part1of3.zip
my-archive.part2of3.zip
my-archive.part3of3.zip
Note: I'm using logging and Apache Commons FilenameUtils, but feel free to use what you have in your toolkit.
注意:我正在使用日志记录和 Apache Commons FilenameUtils,但可以随意使用工具包中的内容。
/**
* Utility class to split a zip archive into parts (not volumes)
* by attempting to fit as many entries into a single part before
* creating a new part. If a part would otherwise be empty because
* the next entry won't fit, it will be added anyway to avoid empty parts.
*
* @author Eric Draken, 2019
*/
public class Zip
{
private static final int DEFAULT_BUFFER_SIZE = 1024 * 4;
private static final String ZIP_PART_FORMAT = "%s.part%dof%d.zip";
private static final String EXT = "zip";
private static final Logger logger = LoggerFactory.getLogger( MethodHandles.lookup().lookupClass() );
/**
* Split a large archive into smaller parts
*
* @param zipFile Source zip file to split (must end with .zip)
* @param outZipFile Destination zip file base path. The "part" number will be added automatically
* @param approxPartSizeBytes Approximate part size
* @throws IOException Exceptions on file access
*/
public static void splitZipArchive(
@NotNull final File zipFile,
@NotNull final File outZipFile,
final long approxPartSizeBytes ) throws IOException
{
String basename = FilenameUtils.getBaseName( outZipFile.getName() );
Path basePath = outZipFile.getParentFile() != null ? // Check if this file has a parent folder
outZipFile.getParentFile().toPath() :
Paths.get( "" );
String extension = FilenameUtils.getExtension( zipFile.getName() );
if ( !extension.equals( EXT ) )
{
throw new IllegalArgumentException( "The archive to split must end with ." + EXT );
}
// Get a list of entries in the archive
try ( ZipFile zf = new ZipFile( zipFile ) )
{
// Silliness check
long minRequiredSize = zipFile.length() / 100;
if ( minRequiredSize > approxPartSizeBytes )
{
throw new IllegalArgumentException(
"Please select a minimum part size over " + minRequiredSize + " bytes, " +
"otherwise there will be over 100 parts."
);
}
// Loop over all the entries in the large archive
// to calculate the number of parts required
Enumeration<? extends ZipEntry> enumeration = zf.entries();
long partSize = 0;
long totalParts = 1;
while ( enumeration.hasMoreElements() )
{
long nextSize = enumeration.nextElement().getCompressedSize();
if ( partSize + nextSize > approxPartSizeBytes )
{
partSize = 0;
totalParts++;
}
partSize += nextSize;
}
// Silliness check: if there are more parts than there
// are entries, then one entry will occupy one part by contract
totalParts = Math.min( totalParts, zf.size() );
logger.debug( "Split requires {} parts", totalParts );
if ( totalParts == 1 )
{
// No splitting required. Copy file
Path outFile = basePath.resolve(
String.format( ZIP_PART_FORMAT, basename, 1, 1 )
);
Files.copy( zipFile.toPath(), outFile );
logger.debug( "Copied {} to {} (pass-though)", zipFile.toString(), outFile.toString() );
return;
}
// Reset
enumeration = zf.entries();
// Split into parts
int currPart = 1;
ZipEntry overflowZipEntry = null;
while ( overflowZipEntry != null || enumeration.hasMoreElements() )
{
Path outFilePart = basePath.resolve(
String.format( ZIP_PART_FORMAT, basename, currPart++, totalParts )
);
overflowZipEntry = writeEntriesToPart( overflowZipEntry, zf, outFilePart, enumeration, approxPartSizeBytes );
logger.debug( "Wrote {}", outFilePart );
}
}
}
/**
* Write an entry to the to the outFilePart
*
* @param overflowZipEntry ZipEntry that didn't fit in the last part, or null
* @param inZipFile The large archive to split
* @param outFilePart The part of the archive currently being worked on
* @param enumeration Enumeration of ZipEntries
* @param approxPartSizeBytes Approximate part size
* @return Overflow ZipEntry, or null
* @throws IOException File access exceptions
*/
private static ZipEntry writeEntriesToPart(
@Nullable ZipEntry overflowZipEntry,
@NotNull final ZipFile inZipFile,
@NotNull final Path outFilePart,
@NotNull final Enumeration<? extends ZipEntry> enumeration,
final long approxPartSizeBytes
) throws IOException
{
try (
ZipOutputStream zos =
new ZipOutputStream( new FileOutputStream( outFilePart.toFile(), false ) )
)
{
long partSize = 0;
byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
while ( overflowZipEntry != null || enumeration.hasMoreElements() )
{
ZipEntry entry = overflowZipEntry != null ? overflowZipEntry : enumeration.nextElement();
overflowZipEntry = null;
long entrySize = entry.getCompressedSize();
if ( partSize + entrySize > approxPartSizeBytes )
{
if ( partSize != 0 )
{
return entry; // Finished this part, but return the dangling ZipEntry
}
// Add the entry anyway if the part would otherwise be empty
}
partSize += entrySize;
zos.putNextEntry( entry );
// Get the input stream for this entry and copy the entry
try ( InputStream is = inZipFile.getInputStream( entry ) )
{
int bytesRead;
while ( (bytesRead = is.read( buffer )) != -1 )
{
zos.write( buffer, 0, bytesRead );
}
}
}
return null; // Finished splitting
}
}