Linux 多个 GZip 文件的快速连接
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8005114/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fast Concatenation of Multiple GZip Files
提问by neversaint
I have list of gzip files:
我有 gzip 文件列表:
file1.gz
file2.gz
file3.gz
Is there a way to concatenate or gzipping these files into one gzip file without having to decompressthem?
有没有办法将这些文件连接或 gzip压缩成一个 gzip 文件 而无需解压缩它们?
In practice we will use this in a web database (CGI). Where the web will receive a query from user and list out all the files based on the query and present them in a batch file back to the user.
在实践中,我们将在 Web 数据库 (CGI) 中使用它。Web 将接收来自用户的查询,并根据查询列出所有文件,并将它们以批处理文件的形式呈现给用户。
回答by bdonlan
With gzip files, you can simply concatenate the files together, like so:
使用 gzip 文件,您可以简单地将文件连接在一起,如下所示:
cat file1.gz file2.gz file3.gz > allfiles.gz
Per the gzip RFC,
根据gzip RFC,
A gzip file consists of a series of "members" (compressed data sets). [...] The members simply appear one after another in the file, with no additional information before, between, or after them.
gzip 文件由一系列“成员”(压缩数据集)组成。[...] 成员只是一个接一个地出现在文件中,在它们之前、之间或之后没有额外的信息。
Note that this is not exactly the same as building a single gzip file of the concatenated data; among other things, all of the original filenames are preserved. However, gunzip seems to handle it as equivalent to a concatenation.
请注意,这与构建连接数据的单个 gzip 文件并不完全相同;除其他外,所有原始文件名都被保留。但是, gunzip 似乎将其处理为等效于串联。
Since existing tools generally ignore the filename headers for the additional members, it's not easily possible to extract individual files from the result. If you want this to be possible, build a ZIP file instead. ZIP and GZIP both use the DEFLATE algorithm for the actual compression (ZIP supports some other compression algorithms as well as an option - method 8 is the one that corresponds to GZIP's compression); the difference is in the metadata format. Since the metadata is uncompressed, it's simple enough to strip off the gzip headers and tack on ZIP file headers and a central directory record instead. Refer to the gzip format specificationand the ZIP format specification.
由于现有工具通常会忽略附加成员的文件名标题,因此不容易从结果中提取单个文件。如果您希望这成为可能,请改为构建 ZIP 文件。ZIP 和 GZIP 都使用 DEFLATE 算法进行实际压缩(ZIP 支持其他一些压缩算法以及一个选项 - 方法 8 是对应于 GZIP 压缩的一种);区别在于元数据格式。由于元数据是未压缩的,因此剥离 gzip 标头并添加 ZIP 文件标头和中央目录记录非常简单。请参阅gzip 格式规范和ZIP 格式规范。
回答by Drona
You can create a tar file of these files and then gzip the tar file to create the new gzip file
您可以创建这些文件的 tar 文件,然后 gzip tar 文件以创建新的 gzip 文件
tar -cvf newcombined.tar file1.gz file2.gz file3.gz
gzip newcombined.tar
回答by Nehal Dattani
Here is what man 1 gzip
says about your requirement.
这是man 1 gzip
关于您的要求的内容。
Multiple compressed files can be concatenated. In this case, gunzip will extract all members at once. For example:
gzip -c file1 > foo.gz gzip -c file2 >> foo.gz
Then
gunzip -c foo
is equivalent to
cat file1 file2
可以连接多个压缩文件。在这种情况下, gunzip 将一次提取所有成员。例如:
gzip -c file1 > foo.gz gzip -c file2 >> foo.gz
然后
gunzip -c foo
相当于
cat file1 file2
Needless to say, file1
can be replaced by file1.gz
.
不用说,file1
可以替换为file1.gz
。
You must notice this:
你必须注意这一点:
gunzip will extract all members at once
gunzip 将一次提取所有成员
So to get all members individually, you will have to use something additional or write, if you wish to do so.
因此,要单独获取所有成员,如果您愿意,您将不得不使用其他内容或编写内容。
However, this is also addressed in man page.
但是,这也在手册页中解决。
If you wish to create a single archive file with multiple members so that members can later be extracted independently, use an archiver such as tar or zip. GNU tar supports the
-z
option to invoke gzip transparently. gzip is designed as a complement to tar, not as a replacement.
如果您希望创建具有多个成员的单个存档文件,以便稍后可以独立提取成员,请使用存档程序,例如 tar 或 zip。GNU tar 支持
-z
透明调用 gzip的选项。gzip 被设计为 tar 的补充,而不是替代品。
回答by matiu
Just use cat. It is very fast (0.2 seconds for 500 MB for me)
只用猫。它非常快(对我来说 500 MB 为 0.2 秒)
cat *gz > final
mv final final.gz
You can then read the output with zcat to make sure it's pretty:
然后,您可以使用 zcat 读取输出以确保它很漂亮:
zcat final.gz
I tried the other answer of 'gz -c' but I ended up with garbage when using already gzipped files as input (I guess it double compressed them).
我尝试了 'gz -c' 的另一个答案,但是当使用已经 gzipped 的文件作为输入时,我最终得到了垃圾(我猜它对它们进行了双重压缩)。
PV:
光伏:
Better yet, if you have it, 'pv' instead of cat:
更好的是,如果你有它,'pv' 而不是 cat:
pv *gz > final
mv final final.gz
This gives you a progress bar as it works, but does the same thing as cat.
这会为您提供一个正常工作的进度条,但与 cat 的作用相同。