java Java和Hash算法来比较文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15441315/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 19:36:11  来源:igfitidea点击:

Java and Hash algorithm to compare files

javahashmd5sha

提问by Stig

I have to fingerprint files to match doublets. What is recommended with Java in 2013? Should I also compare the file size, or is this a unnecessary check?

我必须指纹文件才能匹配双峰。2013年Java推荐什么?我还应该比较文件大小,还是这是不必要的检查?

The probability of false positive should be very close to 0

误报的概率应该非常接近于0

EDIT: Lots of answers, thanks. What is the standard of backup software today? SHA-256? higher? I guess md5 is not suitable?

编辑:很多答案,谢谢。当今备份软件的标准是什么?SHA-256?更高?我猜md5不适合?

回答by Louis Wasserman

If the probability of false positives has to be zero, as opposed to "lower than the probability you will be struck by lightning," then no hash algorithm at all can be used; you must compare the files byte by byte.

如果误报的概率必须为零,而不是“低于你被闪电击中的概率”,那么根本就不能使用哈希算法;您必须逐字节比较文件。

For what it's worth, if you can use third-party libraries, you can use Guavato compare two files byte-by-byte with the one-liner

对于它的价值,如果您可以使用第三方库,您可以使用Guava与 one-liner 逐字节比较两个文件

Files.asByteSource(file1).contentEquals(Files.asByteSource(file2));

which takes care of opening and closing the files as well as the details of comparison.

它负责打开和关闭文件以及比较的细节。

If you're willing to accept false positives that are less likely than getting struck by lightning, then you could do

如果您愿意接受比被闪电击中可能性更低的误报,那么您可以这样做

Files.hash(file, Hashing.sha1()); // or md5(), or sha256(), or...

which returns a HashCode, and then you can test that for equality with the hash of another file. (That version also deals with the messiness of MessageDigest, of opening and closing the file properly, etcetera.)

它返回 a HashCode,然后您可以测试它与另一个文件的哈希值是否相等。(该版本还处理了MessageDigest正确打开和关闭文件等的混乱情况。)

回答by Barney

Are you asking how to getting the md5 checksums of files in Java? If that's the case then read the accepted answers hereand here. Basically, do this:

您是在问如何在 Java 中获取文件的 md5 校验和吗?如果是这种情况,请在此处此处阅读已接受的答案。基本上,这样做:

import java.security.DigestInputStream;
...
...

MessageDigest md_1 = MessageDigest.getInstance("MD5");
MessageDigest md_2 = MessageDigest.getInstance("MD5");
InputStream is_1 = new FileInputStream("file1.txt");
InputStream is_2 = new FileInputStream("file2.txt");
try {
  is_1 = new DigestInputStream(is_1, md_1);
  is_2 = new DigestInputStream(is_2, md_2);
}
finally {
  is_1.close();
  is_2.close();
}
byte[] digest_1 = md_1.digest();
byte[] digest_2 = md_2.digest();

// compare digest_1 and digest_2

Should I also compare the file size, or is this a unnecessary check?

我还应该比较文件大小,还是这是不必要的检查?

It is unnecessary.

这是不必要的。