恢复损坏的zip或者gzip文件?
时间:2020-03-05 18:52:27 来源:igfitidea点击:
破坏压缩文件的最常见方法是无意间执行ASCII模式的FTP传输,这会导致CR和/或者LF字符多对一的破坏。
显然,这会造成信息丢失,解决此问题的最佳方法是以FTP二进制模式再次传输。
但是,如果原始文件丢失了,并且很重要,那么数据的可恢复性如何?
[实际上,我已经知道我认为是最佳答案(很难,但有时可能会在以后发布),以及常见的非答案(用于修复CRC而不修复数据的大量现成程序)。 ,但是我认为在stackoverflow beta期间尝试这个问题,看看是否还有其他人走过成功恢复道路或者发现了我所不知道的工具,这很有趣。
解决方案
回答
我们可以尝试编写一个小脚本以将所有CR替换为CRLF(假设垃圾回收的方向是CRLF到CR),在每个块中随机交换它们,直到获得正确的crc。假设数据不是特别大,我想可能要等到宇宙热死了才完成,才能使用所有的CPU。
由于存在一定的信息丢失,所以我不知道有更好的方法。从CR到CRLF方向的损失可能更容易回滚。
回答
从Bukys软件
Approximately 1 in 256 bytes is known to be corrupted, and the corruption is known to occur only in bytes with the value '2'. So the byte error rate is 1/256 (0.39% of input), and 2/256 bytes (0.78% of input) are suspect. But since only three bits per smashed byte are affected, the bit error rate is only 3/(256*8): 0.15% is bad, 0.29% is suspect. ... An error in the compressed input disrupts the decompression process for all subsequent bytes...The fact that the decompressed output is recognizably bad so quickly is cause for hope -- a search for the correct answer can identify wrong answers quickly. Ultimately, several techniques were combined to successfully extract reasonable data from these files: Domain-specific parsing of fields and quoted strings Machine learning from previous data with low probability of damage Tolerance for file damage due to other causes (e.g. disk full while logging) Lookahead for guiding the search along the highest-probability paths These techniques identify 75% of the necessary repairs with certainty, and the remainder are explored highest-probability-first, so that plausible reconstructions are identified immediately.