git Git是如何解决合并问题的?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/612580/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 03:36:15  来源:igfitidea点击:

How does Git solve the merging problem?

svngitversion-control

提问by Assaf Lavie

SVN made branching much easier by making branches really cheap, but merges remain a real problem in SVN - one that Git supposedly solves.

SVN 使分支变得非常便宜,从而使分支变得更加容易,但合并仍然是 SVN 中的一个真正问题——据说 Git 可以解决这个问题。

Does Git achieve this, and how?

Git 是否实现了这一点,以及如何实现?

(disclaimer: All I know about Git is based on the Linus lecture - total git noob here)

(免责声明:我对 Git 的所有了解均基于 Linus 讲座 - 此处为全部 git noob)

回答by VonC

Git will not prevent conflict in merges but can reconcile history even when they do not share any parent ancestor.
(through The grafts file (.git/info/grafts), which is a list, one per line, of a commit followed by its parents, that you can modify for that "reconciliation" purpose.)
So pretty powerful right there.

Git 不会阻止合并中的冲突,但可以协调历史,即使它们不共享任何父祖先。
(通过嫁接文件 ( .git/info/grafts),这是一个列表,每行一个,提交后跟其父项,您可以为“和解”目的进行修改。)
在那里非常强大。

But to really have a glimpse on "how merges have been thought through", you can start by turning to Linus himself, and realize this issue is not so much about "algorithm":

但要真正了解“合并是如何思考的”,您可以从 Linus 本人开始,并意识到这个问题与“算法”无关:

Linus: Me personally, I want to have something that is very repeatable and non-clever. Something I understand ortells me that it can't do it.
And quite frankly, merging single-file history withouttaking all the other files' history into account makes me go "ugh".

The important part of a merge is not how it handles conflicts (which need to be verified by a human anyway if they are at all interesting), but that it should meld the history together right so that you have a new solid base for future merges.

In other words, the important part is the trivialpart: the naming of the parents, and keeping track of their relationship. Not the clashes.

And it looks like 99% of SCM people seem to think that the solution to that is to be more clever about content merges. Which misses the point entirely.

Linus:就我个人而言,我想要一些非常可重复且不聪明的东西。我理解告诉我它不能做到的事情。
坦率地说,在考虑所有其他文件的历史的情况下合并单个文件的历史让我感到“呃”。

合并的重要部分不是它如何处理冲突(如果它们很有趣,无论如何都需要由人类验证),而是它应该将历史正确地融合在一起,以便您为未来的合并提供一个新的坚实基础.

换句话说,重要的部分是微不足道的部分:父母的命名,并跟踪他们的关系。不是冲突。

看起来 99% 的 SCM 人员似乎认为解决方案是在内容合并方面更加聪明。这完全没有抓住重点。



So Wincent Colaiuta adds (emphasis mine):

所以 Wincent Colaiuta 补充道(强调我的):

There is no need for fancy metadata, rename tracking and so forth.
The only thing you need to store is the state of the tree before and after each change.

What files were renamed? Which ones were copied? Which ones were deleted? What lines were added? Which ones were removed? Which lines had changes made inside them? Which slabs of text were copied from one file to another?
You shouldn't have to care about any of these questions and you certainly shouldn't have to keep special tracking data in order to help you answer them: all the changes to the tree (additions, deletes, renames, edits etc) are implicitly encoded in the delta between the two states of the tree; you just trackwhat is the content.

Absolutely everything can (and should) be inferred.

Git breaks the mould because it thinks about content, not files.
It doesn't track renames, it tracks content. And it does so at a whole-tree level.
This is a radical departure from most version control systems.
It doesn't bother trying to store per-file histories; it instead stores the history at the tree level.
When you perform a diff you are comparing two trees, not two files.

The other fundamentally smart design decision is how Git does merges.
The merging algorithms are smart but they don't try to be too smart. Unambiguous decisions are made automatically, but when there's doubt it's up to the user to decide.
This is the way it should be. You don't want a machine making those decisions for you. You never will want it.
That's the fundamental insight in the Git approach to merging: while every other version control system is trying to get smarter, Git is happily self-described as the "stupid content manager", and it's better for it.

不需要花哨的元数据、重命名跟踪等。
您唯一需要存储的是每次更改前后树的状态。

哪些文件被重命名?抄袭了哪些?删除了哪些?添加了哪些行?删除了哪些?哪些行在其中进行了更改?哪些文本块从一个文件复制到另一个文件?
您不必关心这些问题中的任何一个,当然也不必保留特殊的跟踪数据来帮助您回答它们:对树的所有更改(添加、删除、重命名、编辑等)都是隐式的在树的两个状态之间的增量中编码;你只跟踪什么内容

绝对可以(并且应该)推断出一切

Git 打破常规是因为它考虑的是内容,而不是文件。
它不跟踪重命名,它跟踪内容。它是在整个树级别这样做的。
这与大多数版本控制系统完全不同。
尝试存储每个文件的历史记录并不费心;相反,它在树级别存储历史记录。
当您执行差异时,您是在比较两棵树,而不是两个文件。

另一个从根本上明智的设计决策是 Git 如何合并。
合并算法很聪明,但他们不会试图太聪明。明确的决定是自动做出的,但如果有疑问,则由用户来决定。
这是它应该的方式。您不希望机器为您做出这些决定。你永远不会想要它。
这是 Git 合并方法的基本见解:虽然其他所有版本控制系统都在努力变得更智能,但 Git 很高兴地自称是“愚蠢的内容管理器”,这对它更好。

回答by Jakub Nar?bski

It is now generally agreed on that 3-way merge algorithm (perhaps with enhancements such like rename detection and dealing with more complicated history), which takes into account version on current branch ('ours'), version on merged branch ('theirs'), and version of common ancestor of merged branches ('ancestor') is (from the practical point of view) the best way to resolve merges. In most cases, and for most of the contents tree level merge (which version of file to take) is enough; there rarely is need for dealing with contents conflicts, and then diff3 algorithm is good enough.

现在普遍同意 3 路合并算法(可能具有重命名检测和处理更复杂的历史记录等增强功能),它考虑了当前分支上的版本(“我们的”),合并分支上的版本(“他们的”) ) 和合并分支的共同祖先版本 ('ancestor') 是(从实际角度来看)解决合并的最佳方式。在大多数情况下,对于大多数内容树级合并(采用哪个版本的文件)就足够了;很少需要处理内容冲突,然后diff3算法就足够了。

To use 3-way merge you need to know common ancestor of merged branches (co called merge base). For this you need to know fullhistory between those branches. What Subversion before (current) version 1.5 was lacking (without third party tools such like SVK or svnmerge) was merge tracking, i.e. remembering for merge commit what parents (what commits) were used in merge. Without this information it is not possible to calculate correctly common ancestor in the presence of repeated merges.

要使用三路合并,您需要知道合并分支的共同祖先(也称为合并基础)。为此,您需要了解这些分支之间的完整历史记录。Subversion 在(当前)1.5 版之前(没有第三方工具,如 SVK 或 svnmerge)缺少的是合并跟踪,即记住合并提交时合并中使用了哪些父项(提交了什么)。如果没有这些信息,就不可能在重复合并的情况下正确计算共同祖先。

Take for account the following diagram:

考虑下图:

---.---a---.---b---d---.---1
        \        /
         \-.---c/------.---2
---.---a---.---b---d---.---1
        \        /
         \-.---c/------.---2

(which would probably get mangled... it would be nice to have ability to draw ASCII-art diagrams here).
When we were merging commits 'b' and 'c' (creating commit 'd'), the common ancestor was the branching point, commit 'a'. But when we want to merge commits '1' and '2', now the common ancestor is commit 'c'. Without storing merge information we would have to conclude wrongly that it is commit 'a'.

(这可能会被破坏......能够在这里绘制 ASCII 艺术图会很好)
当我们合并提交“b”和“c”(创建提交“d”)时,共同的祖先是分支点,提交“a”。但是当我们想要合并提交“1”和“2”时,现在共同的祖先是提交“c”。如果不存储合并信息,我们将不得不错误地得出它是提交“a”的结论。

Subversion (prior to version 1.5), and earlier CVS, made merging hard because you had to calculate common ancestor yourself, and give information about ancestor manually when doing a merge.

Subversion(1.5 版之前)和更早的 CVS 使合并变得困难,因为您必须自己计算共同祖先,并在进行合并时手动提供有关祖先的信息。

Git stores information about all parents of a commit (more than one parent in the case of merge commit) in the commit object. This way you can say that Git stores DAG (direct acyclic graph) of revisions, storing and remembering relationships between commits.

Git 在提交对象中存储有关提交的所有父项(在合并提交的情况下超过一个父项)的信息。这样你就可以说 Git 存储了修订的 DAG(有向无环图),存储和记住提交之间的关系。



(I am not sure how Subversion deals with the issues mentioned below)

(我不确定 Subversion 如何处理下面提到的问题)

Additionally merging in Git can deal with two additional complication issues: file renames(when one side renamed a file, and other didn't; we want to get rename, and we want to get changes applied to correct file) and criss-cross merges(more complicated history, when there is more than one common ancestor).

此外,在 Git 中合并可以处理两个额外的复杂问题:文件重命名(当一侧重命名文件时,另一侧没有;我们想要重命名,并且我们想要将更改应用于正确的文件)和交叉合并(更复杂的历史,当有多个共同祖先时)。

  • File renamesduring merge are managed using heuristic similarity score based (both similarity of file contents and similarity of pathname is taken into account) rename detection. Git detects which files correspond to each other in merged branches (and ancestor(s)). In practice it works quite well for real world cases.
  • Criss-cross merges, see definition at revctrl.org wiki, (and presence of multiple merge bases) are managed by using recursive merge strategy, which generates single virtual common ancestor.
  • 合并期间的文件重命名使用基于启发式相似度得分(考虑文件内容的相似性和路径名的相似性)重命名检测进行管理。Git 检测合并分支(和祖先)中哪些文件相互对应。在实践中,它适用于现实世界的案例。
  • 交叉合并,请参见revctrl.org wiki 上的定义,(以及多个合并基础的存在)通过使用递归合并策略进行管理,该策略生成单个虚拟公共祖先。

回答by jdwyah

Answers above are all correct, but I think they miss the centerpoint of git's easy merges for me. An SVN merge requires you to keep track and remember what's been merged and that's a huge PITA. From their docs:

上面的答案都是正确的,但我认为他们错过了 git 对我来说容易合并的中心点。SVN 合并需要您跟踪并记住合并的内容,这是一个巨大的 PITA。从他们的文档:

svn merge -r 23:30 file:///tmp/repos/trunk/vendors

Now that's not killer, but if you forget whether it's 23-30 inclusive or 23-30 exclusive, or whether you've already merged some of those commits, you're hosed and you've got to go figure out the answers to avoid repeating or missing commits. God help you if you branch a branch.

现在这不是杀手,但如果你忘记它是 23-30 包含还是 23-30 独占,或者你是否已经合并了其中的一些提交,那么你就被灌输了,你必须去找出答案以避免重复或丢失提交。如果你分支一个分支,上帝会帮助你。

With git it's just git merge and all this happens seamlessly, even if you've cherry-picked a couple commits or done any number of fantastical git-land things.

使用 git 它只是 git merge 并且所有这一切都无缝地发生,即使您已经挑选了几个提交或完成了任何数量的梦幻般的 git-land 事情。

回答by hillu

As far as I know, the merging algorithms are not any smarter than those in other version control systems. However, because of git's distributed nature, there is no need for centralized merging efforts. Every developer can rebase or merge small changes from other developers into his tree at any time, thus the conflicts that arise tend to be smaller.

据我所知,合并算法并不比其他版本控制系统中的算法聪明。但是,由于 git 的分布式特性,不需要集中合并工作。每个开发人员都可以随时将其他开发人员的小改动变基或合并到他的树中,因此出现的冲突往往更小。

回答by RibaldEddie

Git just makes it more difficult to screw up everyone else's repository with a bad merge.

Git 只会让合并错误更难搞砸其他人的存储库。

The only real benefit is that Git is much, much faster at merging because everything is done locally and it's written in C.

唯一真正的好处是 Git 在合并方面要快得多,因为一切都是在本地完成的,而且是用 C 编写的。

SVN, properly used, is perfectly usable.

SVN,使用得当,完全可用。