检测流是否在 Java 中压缩的最佳方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1809007/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Best way to detect if a stream is zipped in Java
提问by Fedearne
What is the best way to find out i java.io.InputStream
contains zipped data?
找出我java.io.InputStream
包含压缩数据的最佳方法是什么?
采纳答案by McDowell
The magic bytesfor the ZIP format are 50 4B
. You could test the stream (using markand reset- you may need to buffer) but I wouldn't expect this to be a 100% reliable approach. There would be no way to distinguish it from a US-ASCII encoded text file that began with the letters PK
.
ZIP 格式的魔术字节是50 4B
. 您可以测试流(使用标记和重置- 您可能需要缓冲),但我不希望这是一种 100% 可靠的方法。没有办法将它与以字母 开头的 US-ASCII 编码文本文件区分开来PK
。
The bestway would be to provide metadata on the content format prior to opening the stream and then treat it appropriately.
在最好的办法是之前打开流提供了内容格式元数据,然后进行适当处理。
回答by miku
Not very elegant, but reliable:
不是很优雅,但可靠:
If the Stream can be read via ZipInputStream
, it should be zipped.
如果可以通过 读取流ZipInputStream
,则应对其进行压缩。
回答by Dave Webb
You could check that the first four bytes of the stream are the local file header signaturethat starts the local file headerthat proceeds every file in a ZIP file, as shown in the spec hereto be 50 4B 03 04
.
您可以检查流的前四个字节是否是本地文件头签名,该签名启动本地文件头,该文件头处理ZIP 文件中的每个文件,如此处的规范所示为50 4B 03 04
.
A little test code shows this to work:
一个小测试代码表明它可以工作:
byte[] buffer = new byte[4];
try {
ZipOutputStream zos = new ZipOutputStream(new FileOutputStream("so.zip"));
ZipEntry ze = new ZipEntry("HelloWorld.txt");
zos.putNextEntry(ze);
zos.write("Hello world".getBytes());
zos.close();
FileInputStream is = new FileInputStream("so.zip");
is.read(buffer);
is.close();
}
catch(IOException e) {
e.printStackTrace();
}
for (byte b : buffer) {
System.out.printf("%H ",b);
}
Gave me this output:
给了我这个输出:
50 4B 3 4
回答by Innokenty
Introduction
介绍
Since all the answers are 5 years old I feel a duty to write down, what's going on today. I seriously doubt one should read magic bytes of the stream! That's a low level code, it should be avoided in general.
由于所有答案都是 5 岁,我觉得有责任写下今天发生的事情。我严重怀疑是否应该读取流的魔术字节!这是一个低级代码,一般应该避免。
Simple answer
简单的回答
miku writes:
miku 写道:
If the Stream can be read via ZipInputStream, it should be zipped.
如果 Stream 可以通过 ZipInputStream 读取,则应该对其进行压缩。
Yes, but in case of ZipInputStream
"can be read" means that first call to .getNextEntry()
returns a non-null value. No exception catching et cetera. So instead of magic bytes parsing you can just do:
是的,但在ZipInputStream
“可以读取”的情况下意味着第一次调用.getNextEntry()
返回一个非空值。捕捉等也不例外。因此,您可以执行以下操作,而不是魔术字节解析:
boolean isZipped = new ZipInputStream(yourInputStream).getNextEntry() != null;
And that's it!
就是这样!
General unzipping thoughts
一般解压思路
In general, it appeared that it's much more convenient to work with files while [un]zipping, than with streams. There are several useful libraries, plus ZipFile has got more functionality than ZipInputStream. Handling of zip files is discussed here: What is a good Java library to zip/unzip files?So if you can work with files you better do!
一般来说,似乎在[解]压缩时处理文件比处理流要方便得多。有几个有用的库,而且 ZipFile 具有比 ZipInputStream 更多的功能。此处讨论了 zip 文件的处理:压缩/解压缩文件的好 Java 库是什么?因此,如果您可以处理文件,则最好这样做!
Code sample
代码示例
I needed in my application to work with streams only. So that's the method I wrote for unzipping:
我需要在我的应用程序中仅使用流。所以这就是我写的解压方法:
import org.apache.commons.io.IOUtils;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
public boolean unzip(InputStream inputStream, File outputFolder) throws IOException {
ZipInputStream zis = new ZipInputStream(inputStream);
ZipEntry entry;
boolean isEmpty = true;
while ((entry = zis.getNextEntry()) != null) {
isEmpty = false;
File newFile = new File(outputFolder, entry.getName());
if (newFile.getParentFile().mkdirs() && !entry.isDirectory()) {
FileOutputStream fos = new FileOutputStream(newFile);
IOUtils.copy(zis, fos);
IOUtils.closeQuietly(fos);
}
}
IOUtils.closeQuietly(zis);
return !isEmpty;
}
回答by kk nair
Checking the magic number may not be the right option.
检查幻数可能不是正确的选择。
Docx files are also having similar magic number 50 4B 3 4
Docx 文件也有类似的幻数 50 4B 3 4
回答by Stone
Since both .zip and .xlsx having the same Magic number, I couldn't find the valid zip file (if renamed).
由于 .zip 和 .xlsx 具有相同的幻数,我找不到有效的 zip 文件(如果重命名)。
So, I have used Apache Tika to find the exact document type.
因此,我使用 Apache Tika 来查找确切的文档类型。
Even if renamed the file type as zip, it finds the exact type.
即使将文件类型重命名为 zip,它也会找到确切的类型。
Reference: https://www.baeldung.com/apache-tika