C++ 如何检查文件是否是 gzip 压缩的?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6059302/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-28 19:25:10  来源:igfitidea点击:

How to check if a file is gzip compressed?

c++cfile-iogzipzlib

提问by Deepak Prakash

I have a C / C++ program which needs to read in a file that may or may not be gzip compressed. I know we can use gzread() from zlib to read in both compressed and uncompressed files - however, I want to use the zlib functions ONLY if the file is gzip compressed (for performance reasons).

我有一个 C/C++ 程序,它需要读取一个文件,这个文件可能是也可能不是 gzip 压缩的。我知道我们可以使用 zlib 中的 gzread() 来读取压缩文件和未压缩文件 - 但是,我只想在文件是 gzip 压缩文件时使用 zlib 函数(出于性能原因)。

So is there any way to programatically detect or check if a certain file is gzipped from C / C++?

那么有没有办法以编程方式检测或检查某个文件是否是从 C/C++ 压缩的?

回答by Bruno Rohée

There is a magic number at the beginning of the file. Just read the first two bytes and check if they are equal to 0x1f8b.

文件开头有一个幻数。只需读取前两个字节并检查它们是否等于0x1f8b.

回答by pmg

Do you prefer false positives, false negatives, or no false results at all (there goes performance down the drain...)?

你更喜欢误报、漏报还是根本没有错误结果(性能会下降......)?

The RFC 1952: GZIP file format specification version 4.3states the first 2 bytes (of each member and therefore) of the file are '\x1F'and '\x8B'. Use that for a first check that can result in false positives.

RFC 1952:GZIP文件格式规范版本4.3的状态的文件的前2个字节(各部件的且因此)是'\x1F''\x8B'。将其用于可能导致误报的第一次检查。

回答by Jong Bor Lee

What is the difference in performance between reading compressed and uncompressed files using gzread()?

使用 gzread() 读取压缩文件和未压缩文件的性能有何不同?

Anyway, in order to detect if a file is gzipped, you can read the magic numberat the beginning of the file, which is 1f 8baccording to the link.

无论如何,为了检测文件是否被gzip,您可以读取文件开头的幻数,这是1f 8b根据链接。

回答by 0xC0000022L

You can test for the signatures described in the RFCs 1951 and 1952to get an idea. For GZIP files the second one is the relevant and it is definitive. There are some false positives on other formats, so you should check as much of the header for plausible values.

您可以测试 RFC 1951 和1952 中描述的签名以获得一个想法。对于 GZIP 文件,第二个是相关的并且是确定的。其他格式存在一些误报,因此您应该尽可能多地检查标题中的合理值。

For just zlib streams it's somewhat harder, because they are even more prone to false positives. But you would rarely encounter those in the wild on their own.

对于 zlib 流来说有点困难,因为它们更容易出现误报。但是你很少会遇到那些在野外独自生活的人。