C#中的二进制补丁生成

Question

提问by Lasse V. Karlsen

Does anyone have, or know of, a binary patch generation algorithm implementation in C#?

有没有人知道或知道 C# 中的二进制补丁生成算法实现？

Basically, compare two files (designated oldand new), and produce a patch file that can be used to upgrade the oldfile to have the same contents as the newfile.

基本上，比较两个文件（指定为old和new），并生成一个补丁文件，可用于将旧文件升级为与新文件具有相同的内容。

The implementation would have to be relatively fast, and work with huge files. It should exhibit O(n) or O(logn) runtimes.

实现必须相对较快，并且可以处理大文件。它应该表现出 O(n) 或 O(logn) 运行时。

My own algorithms tend to either be lousy (fast but produce huge patches) or slow (produce small patches but have O(n^2) runtime).

我自己的算法往往要么很糟糕（快速但产生巨大的补丁）或缓慢（产生小补丁但有 O(n^2) 运行时）。

Any advice, or pointers for implementation would be nice.

任何建议或实施指针都会很好。

Specifically, the implementation will be used to keep servers in sync for various large datafiles that we have one master server for. When the master server datafiles change, we need to update several off-site servers as well.

具体来说，该实现将用于使服务器与我们拥有一台主服务器的各种大型数据文件保持同步。当主服务器数据文件发生变化时，我们也需要更新几个异地服务器。

The most naive algorithm I have made, which only works for files that can be kept in memory, is as follows:

我做的最幼稚的算法，只对可以保存在内存中的文件有效，如下：

Grab the first four bytes from the oldfile, call this the key
Add those bytes to a dictionary, where key -> position, where positionis the position where I grabbed those 4 bytes, 0 to begin with
Skip the first of these four bytes, grab another 4 (3 overlap, 1 one), and add to the dictionary the same way
Repeat steps 1-3 for all 4-byte blocks in the oldfile
From the start of the newfile, grab 4 bytes, and attempt to look it up in the dictionary
If found, find the longest match if there are several, by comparing bytes from the two files
Encode a reference to that location in the oldfile, and skip the matched block in the newfile
If not found, encode 1 byte from the newfile, and skip it
Repeat steps 5-8 for the rest of the newfile

从旧文件中获取前四个字节，将其称为密钥
将这些字节添加到字典中，其中key -> position，其中position是我抓取这 4 个字节的位置，0 开始
跳过这四个字节中的第一个，再抓取4个（3个重叠，1个），以同样的方式添加到字典中
对旧文件中的所有 4 字节块重复步骤 1-3
从开始的新文件，抢4个字节，并试图寻找它在字典中
如果找到，则通过比较两个文件中的字节数，找到最长的匹配项（如果有多个）
对旧文件中该位置的引用进行编码，并跳过新文件中的匹配块
如果没有找到，从新文件中编码 1 个字节，并跳过它
对新文件的其余部分重复步骤 5-8

This is somewhat like compression, without windowing, so it will use a lot of memory. It is, however, fairly fast, and produces quite small patches, as long as I try to make the codes output minimal.

这有点像压缩，没有开窗，所以会占用大量内存。然而，只要我尝试使代码输出最小，它就相当快，并且会产生很小的补丁。

A more memory-efficient algorithm uses windowing, but produces much bigger patch files.

内存效率更高的算法使用窗口化，但会生成更大的补丁文件。

There are more nuances to the above algorithm that I skipped in this post, but I can post more details if necessary. I do, however, feel that I need a different algorithm altogether, so improving on the above algorithm is probably not going to get me far enough.

我在这篇文章中跳过了上述算法的更多细微差别，但如有必要，我可以发布更多详细信息。然而，我确实觉得我需要一个完全不同的算法，所以改进上述算法可能不会让我走得足够远。

Edit #1: Here is a more detailed description of the above algorithm.

编辑#1：这里是上述算法的更详细描述。

First, combine the two files, so that you have one big file. Remember the cut-point between the two files.

首先，合并两个文件，这样你就有了一个大文件。记住两个文件之间的切点。

Secondly, do that grab 4 bytes and add their position to the dictionarystep for everything in the whole file.

其次，抓取 4 个字节并将它们的位置添加到整个文件中所有内容的字典步骤中。

Thirdly, from where the newfile starts, do the loop with attempting to locate an existing combination of 4 bytes, and find the longest match. Make sure we only consider positions from the old file, or from earlier in the new file than we're currently at. This ensures that we can reuse material in both the old and the new file during patch application.

第三，从新文件开始的地方开始循环，尝试定位现有的 4 个字节组合，并找到最长的匹配项。确保我们只考虑旧文件中的位置，或新文件中比当前位置更早的位置。这确保了我们可以在补丁应用期间重复使用旧文件和新文件中的材料。

Edit #2: Source code to the above algorithm

编辑#2：上述算法的源代码

You might get a warning about the certificate having some problems. I don't know how to resolve that so for the time being just accept the certificate.

您可能会收到有关证书存在问题的警告。我不知道如何解决，所以暂时只接受证书。

The source uses lots of other types from the rest of my library so that file isn't all it takes, but that's the algorithm implementation.

源代码使用了我库的其余部分中的许多其他类型，因此文件并不是全部，但这就是算法实现。

@lomaxx, I have tried to find a good documentation for the algorithm used in subversion, called xdelta, but unless you already know how the algorithm works, the documents I've found fail to tell me what I need to know.

@lomaxx，我试图为 subversion 中使用的算法找到一个很好的文档，称为 xdelta，但除非你已经知道算法是如何工作的，否则我找到的文档无法告诉我我需要知道什么。

Or perhaps I'm just dense... :)

或者也许我只是密集... :)

I took a quick peek on the algorithm from that site you gave, and it is unfortunately not usable. A comment from the binary diff file says:

我从您提供的那个网站上快速浏览了算法，不幸的是它无法使用。来自二进制差异文件的评论说：

Finding an optimal set of differences requires quadratic time relative to the input size, so it becomes unusable very quickly.

找到一组最佳差异需要相对于输入大小的二次时间，因此它很快变得无法使用。

My needs aren't optimal though, so I'm looking for a more practical solution.

不过，我的需求并不是最佳的，所以我正在寻找更实用的解决方案。

Thanks for the answer though, added a bookmark to his utilities if I ever need them.

感谢您的回答，如果我需要的话，可以在他的实用程序中添加书签。

Edit #1: Note, I will look at his code to see if I can find some ideas, and I'll also send him an email later with questions, but I've read that book he references and though the solution is good for finding optimal solutions, it is impractical in use due to the time requirements.

编辑 #1：注意，我会查看他的代码，看看我是否能找到一些想法，稍后我还会向他发送一封带有问题的电子邮件，但我已经阅读了他引用的那本书，尽管该解决方案适用于寻找最佳解决方案，由于时间要求，在使用中是不切实际的。

Edit #2: I'll definitely hunt down the python xdelta implementation.

编辑 #2：我肯定会追捕 python xdelta 实现。

Answer 1

采纳答案by lomaxx

Sorry I couldn't be more help. I would definately keep looking at xdelta because I have used it a number of times to produce quality diffs on 600MB+ ISO files we have generated for distributing our products and it performs very well.

对不起，我帮不上忙了。我肯定会继续关注 xdelta，因为我已经多次使用它来对我们为分发我们的产品而生成的 600MB+ ISO 文件产生质量差异，并且它表现得非常好。

Answer 2

回答by lomaxx

It might be worth checking out what some of the other guys are doing in this space and not necessarily in the C# arena either.

可能值得看看其他一些人在这个领域所做的事情，而不一定是在 C# 领域。

This is a library written in c#

这是一个用c#编写的库

SVN also has a binary diff algorithm and I know there's an implementation in python although I couldn't find it with a quick search. They might give you some ideas on where to improve your own algorithm

SVN 也有一个二进制差异算法，我知道在 python 中有一个实现，尽管我无法通过快速搜索找到它。他们可能会给你一些关于在哪里改进你自己的算法的想法

Answer 3

回答by TimM

If this is for installation or distribution, have you considered using the Windows Installer SDK? It has the ability to patch binary files.

如果这是用于安装或分发，您是否考虑过使用 Windows Installer SDK？它具有修补二进制文件的能力。

http://msdn.microsoft.com/en-us/library/aa370578(VS.85).aspx

Answer 4

回答by Larry Smithmier

Have you seen VCDiff? It is part of a Misc library that appears to be fairly active (last release r259, April 23rd 2008). I haven't used it, but thought it was worth mentioning.

你见过VCDiff吗？它是一个看起来相当活跃的 Misc 库的一部分（最新版本 r259，2008 年 4 月 23 日）。我没用过，但觉得值得一提。

Answer 5

回答by jtalarico

This is a rough guideline, but the following is for the rsync algorithm which can be used to create your binary patches.

这是一个粗略的指导方针，但以下是针对可用于创建二进制补丁的 rsync 算法。

http://rsync.samba.org/tech_report/tech_report.html

Answer 6

回答by Bradley Grainger

bsdiffwas designed to create very small patches for binary files. As stated on its page, it requires max(17*n,9*n+m)+O(1)bytes of memory and runs in O((n+m) log n)time (where nis the size of the old file and mis the size of the new file).

bsdiff旨在为二进制文件创建非常小的补丁。正如其页面所述，它需要max(17*n,9*n+m)+O(1)字节的内存并O((n+m) log n)及时运行（其中n是旧文件m的大小，是新文件的大小）。

The original implementation is in C, but a C# port is described hereand available here.

最初的实现是用 C 语言实现的，但是这里描述了一个 C# 端口，并且可以在这里找到。

C#中的二进制补丁生成

提问by Lasse V. Karlsen

采纳答案by lomaxx

回答by lomaxx

回答by TimM

回答by Larry Smithmier

回答by jtalarico

回答by Bradley Grainger

相关推荐

最近更新

标签

C#中的二进制补丁生成

提问by Lasse V. Karlsen

采纳答案by lomaxx

回答by lomaxx

回答by TimM

回答by Larry Smithmier

回答by jtalarico

回答by Bradley Grainger

相关推荐

C# 从 JavaScript 调用 ASP.NET 函数？

如何在 C# 中创建树视图首选项对话框类型的界面？

C# 这是确定操作系统架构的好方法吗？

C# 什么是好的 .NET Profiler？

相关推荐

最近更新

标签