如何更新 git 浅克隆?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41075972/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 12:29:25  来源:igfitidea点击:

How to update a git shallow clone?

gitgit-clone

提问by Hibou57

Background

背景

(for tl;dr, see #questions below)

(对于 tl;dr,请参阅下面的 #questions)

I have multiple git repository shallow clones. I'm using shallow clones because it's a lot smaller compared to a deep clone. Each is cloned doing about git clone --single-branch --depth 1 <git-repo-url> <dir-name>.

我有多个 git 存储库浅克隆。我正在使用浅克隆,因为与深克隆相比,它要小得多。每个都被克隆做 about git clone --single-branch --depth 1 <git-repo-url> <dir-name>

This works fine, except I don't see how to update it.

这工作正常,除了我不知道如何更新它。

When I'm cloning by a tag, update is not meaningful, as a tag is frozen point in time (as I understand it). In this case, if I want to update, this means I want to clone by another tag, so I just rm -rf <dir-name>and clone again.

当我通过标签克隆时,更新没有意义,因为标签在时间点被冻结(据我所知)。在这种情况下,如果我想更新,这意味着我想通过另一个标签进行克隆,所以我只是rm -rf <dir-name>再次克隆。

Things get more complicated when I've cloned the HEAD of a master branch then later want to update it.

当我克隆了一个主分支的 HEAD 之后又想要更新它时,事情会变得更加复杂。

I tried git pull --depth 1but although I'm not to push anything to the remote repository, it complains it don't know who I am.

我试过了,git pull --depth 1但虽然我不会将任何东西推送到远程存储库,但它抱怨它不知道我是谁。

I tried git fetch --depth 1, but although it seems to update something, I checked it is not up to date (some files on the remote repository have a different content than the ones on my clone).

我试过git fetch --depth 1,但虽然它似乎更新了一些东西,但我检查它不是最新的(远程存储库中的某些文件与我的克隆中的文件具有不同的内容)。

After https://stackoverflow.com/a/20508591/279335, I tried git fetch --depth 1; git reset --hard origin/master, but two things: first I don't understand why git resetis needed, second, although the files seems to be up to date, some old files remains, and git clean -dfdoes not delete these files.

https://stackoverflow.com/a/20508591/279335之后,我尝试了git fetch --depth 1; git reset --hard origin/master,但有两件事:首先我不明白为什么git reset需要,其次,虽然文件似乎是最新的,但一些旧文件仍然存在,并且git clean -df不会删除这些文件。

Questions

问题

Let a clone created with git clone --single-branch --depth 1 <git-repo-url> <dir-name>. How to update it to achieve the same result as rm -rf <dir-name>; git clone --single-branch --depth 1 <git-repo-url> <dir-name>? Or is rm -rf <dir-name>and clone again the only way?

让一个克隆创建与git clone --single-branch --depth 1 <git-repo-url> <dir-name>. 如何更新它以达到相同的结果rm -rf <dir-name>; git clone --single-branch --depth 1 <git-repo-url> <dir-name>?还是rm -rf <dir-name>再次克隆是唯一的方法?

Note

笔记

This is not a duplicate of How to update a shallow cloned submodule without increasing main repo size, as the answer does not fulfil my expectations and I'm using simple repositories, not sub?modules (which I don't know about).

这不是如何在不增加主存储库大小的情况下更新浅克隆子模块的副本,因为答案不符合我的期望,而且我使用的是简单的存储库,而不是子模块(我不知道)。

回答by torek

[slightly reworded and formatted] Given a clone created with git clone --single-branch --depth 1 urldirectory, how can I update it to achieve the same result as rm -rf directory; git clone --single-branch --depth 1 urldirectory?

[稍微改写和格式化] 鉴于使用创建的克隆,我如何更新它以获得与 相同的结果?git clone --single-branch --depth 1 urldirectoryrm -rf directory; git clone --single-branch --depth 1 urldirectory

Note that --single-branchis the defaultwhen using --depth 1. The (single) branch is the one you give with -b. There's a long aside that goes here about using -bwith tags but I will leave that for later. If you don'tuse -b, your Git asks the "upstream" Git—the Git at url—which branch ithas checked-out, and pretends you used -b thatbranch. This means that it is important to be careful when using --single-branchwithout-bto make sure that this upstream repository's current branch is sensible, and of course, when you douse -b, to make sure that the branch argument you give really does name a branch, not a tag.

请注意,--single-branch默认使用的时候--depth 1。(单个)分支是您提供的分支-b。关于-b与标签一起使用,这里有很长的一段话,但我将把它留到以后再说。如果您使用-b,则您的 Git 会询问“上游”Git — url 上的 Git —已检出哪个分支,并假装您使用了. 这意味着,使用时要小心是很重要的,而不以确保这一上游资源库的当前分支是明智的,当然,当你使用,以确保你给真正的分支争论并命名一个分支,而不是标签。-b thatbranch--single-branch-b-b

The simple answer is basically this one, with two slight changes:

简单的答案基本上就是这个,有两个细微的变化:

After https://stackoverflow.com/a/20508591/279335, I tried git fetch --depth 1; git reset --hard origin/master, but two things: first I don't understand why git resetis needed, second, although the files seems to be up to date, some old files remains, and git clean -dfdoes not delete these files.

https://stackoverflow.com/a/20508591/279335之后,我尝试了git fetch --depth 1; git reset --hard origin/master,但有两件事:首先我不明白为什么git reset需要,其次,虽然文件似乎是最新的,但一些旧文件仍然存在,并且git clean -df不会删除这些文件。

The two slight changes are: make sure you use origin/branchnameinstead, and add -x(git clean -d -f -xor git clean -dfx) to the git cleanstep. As for why, that gets a bit more complicated.

两个细微的变化是:确保使用代替,并将(或)添加到步骤中。至于为什么,那就有点复杂了。origin/branchname-xgit clean -d -f -xgit clean -dfxgit clean

What's going on

这是怎么回事

Without--depth 1, the git fetchstep calls up the other Git and gets from it a list of branch names and corresponding commit hash IDs. That is, it finds a list of allthe upstream's branches and their current commits. Then, because you have a --single-branchrepository, yourGit throws out all but the single branch, and brings over everything Git needs to connect that current commit back to the commit(s) you already have in your repository.

如果没有--depth 1,该git fetch步骤会调用另一个 Git 并从中获取分支名称和相应提交哈希 ID 的列表。也就是说,它查找所有上游分支及其当前提交的列表。然后,因为你有一个--single-branch仓库,你的Git 会抛出除单个分支之外的所有内容,并带来 Git 需要的一切,以将当前提交连接回你在仓库中已有的提交。

With--depth 1, your Git doesn't bother connecting the new commit to older historical commits at all. Instead, it obtains just the one commit and the other Git objects needed to complete that one commit. It then writes an additional "shallow graft" entry to mark that one commit as a new pseudo-root commit.

使用--depth 1,您的 Git 根本不会费心将新提交连接到旧的历史提交。相反,它只获取一次提交以及完成该一次提交所需的其他 Git 对象。然后它会写一个额外的“浅移植”条目来标记该提交为新的伪根提交。

Regular (non-shallow) clone and fetch

常规(非浅层)克隆和获取

These are all related to how Git behaves when you're using a normal (non-shallow, non-single-branch) clone: git fetchcalls up the upstream Git, gets a list of everything, and then brings over whatever you don't already have. This is why an initial clone is so slow, and a fetch-to-update is usually so fast: once you get a full clone, the updates rarely have very much to bring over: maybe a few commits, maybe a few hundred, and most of those commits don't need much else either.

这些都与当您使用普通(非浅层、非单分支)克隆时 Git 的行为方式有关:git fetch调用上游 Git,获取所有内容的列表,然后带入您还没有的任何内容有。这就是为什么初始克隆如此缓慢,而 fetch-to-update 通常如此之快的原因:一旦您获得完整的克隆,更新很少会带来很多:也许是几次提交,也许是几百次,以及大多数提交也不需要太多其他内容。

The history of a repository is formed from the commits. Each commit names its parentcommit (or for merges, parent commits, plural), in a chain that goes backwards from "the latest commit", to the previous commit, to some more-ancestral commit, and so on. The chain eventually stops when it reaches a commit that has no parent, such as the first commit ever made in the repository. This kind of commit is a rootcommit.

存储库的历史记录由提交形成。每个提交都命名其父提交(或对于合并,父提交,复数),在一个链中,从“最新提交”到前一个提交,再到一些更祖先的提交,依此类推。当它到达一个没有父级的提交时,链最终会停止,例如在存储库中进行的第一次提交。这种提交是提交。

That is, we can draw a graph of commits. In a really simple repository the graph is just a straight line, with all the arrows pointing backwards:

也就是说,我们可以绘制提交图。在一个非常简单的存储库中,图形只是一条直线,所有的箭头都指向后:

o <- o <- o <- o   <-- master

The name masterpoints to the fourth and latest commit, which points back to the third, which points back to the second, which points back to the first.

该名称master指向第四个和最新的提交,它指向第三个,指向第二个,指向第一个。

Each commit carries with it a complete snapshot of all the files that go in that commit. Files that are not at all changed are sharedacross these commits: the fourth commit just "borrows" the unchanged version from the third commit, which "borrows" it from the second, and so on. Hence, each commit names all the "Git objects" that it needs, and Git either finds those objects locally—because it already has them—or uses the fetchprotocol to bring them over from the other, upstream Git. There's a compression format called "packing", and a special variant for network transfer called "thin packs", that allows Git to do this even better / fancier, but the principle is simple: Git needs all, and only, those objects that go with the new commits it's picking up. Your Git decides whether it has those objects, and if not, obtains them from their Git.

每个提交都带有该提交中所有文件的完整快照。根本没有更改的文件在这些提交之间共享:第四次提交只是从第三次提交“借用”未更改的版本,从第二次提交“借用”它,依此类推。因此,每次提交都会命名它需要的所有“Git 对象”,并且 Git 要么在本地找到这些对象——因为它已经拥有它们——或者使用fetch协议将它们从另一个上游 Git 引入。有一种称为“packing”的压缩格式,以及一种称为“thin packs”的网络传输特殊变体,它允许 Git 做得更好/更高级,但原理很简单:Git 需要所有且仅需要那些对象随着新的提交,它正在回升。你的 Git 决定它是否有这些对象,如果没有,从他们的 Git 中获取它们。

A more-complicated, more-complete graph generally has several points where it branches, some where it merges, and multiple branch names pointing to different branch tips:

一个更复杂、更完整的图通常有几个分支点,一些合并点,多个分支名称指向不同的分支提示:

        o--o   <-- feature/tall
       /
o--o--o---o    <-- master
    \    /
     o--o      <-- bug/short

Here branch bug/shortis merged back into master, while branch feature/tallis still undergoing development. The namebug/shortcan (probably) now be deleted entirely: we don't need it anymore if we are done making commits on it. The commit at the tip of masternames twoprevious commits, including the commit at the tip of bug/short, so by fetching masterwe will fetch the bug/shortcommits.

这里 branchbug/short合并回master,而 branchfeature/tall仍在开发中。现在可以(可能)完全删除该名称bug/short:如果我们完成对它的提交,我们就不再需要它了。尖端的提交master命名了两个先前的提交,包括尖端的提交bug/short,因此通过获取master我们将获取bug/short提交。

Note that both the simple and slightly-more-complicated graph each have just one root commit. That's pretty typical: all repositories that have commits have at leastone root commit, since the very first commit is always a root commit; but most repositories have onlyone root commit as well. You can, however, have different root commits, as with this graph:

请注意,简单图和稍微复杂一点的图都只有一个根提交。这是非常典型的:所有有提交的存储库至少有一个根提交,因为第一个提交总是根提交;但大多数存储库也只有一个根提交。但是,您可以有不同的根提交,如下图所示:

 o--o
     \
o--o--o   <-- master

or this one:

或者这个:

 o--o     <-- orphan

o--o      <-- master

In fact, the one with just the one masterwas probably made by merging orphaninto master, then deleting the name orphan.

事实上,只有一个的master可能是通过合并orphanmaster,然后删除名称orphan

Grafts and replacements

移植物和替代物

Git has for a long time had (possibly shaky) support for grafts, which was replaced with (much better, actually-solid) support for generic replacements. To grasp them concretely we need to add, to the above, the notion that each commit has its own unique ID. These IDs are the big ugly 40-character SHA-1 hashes, face0ff...and so on. In fact, everyGit object has a unique ID, though for graph purposes, all we care about are the commits.

长期以来,Git 一直(可能不稳定)支持嫁接,取而代之的是(更好,实际上是可靠的)对通用替换的支持。为了具体地掌握它们,我们需要在上面添加一个概念,即每个提交都有自己的唯一 ID。这些 ID 是丑陋的 40 个字符的 SHA-1 哈希,face0ff...等等。事实上,每个Git 对象都有一个唯一的 ID,尽管出于图表目的,我们只关心提交。

For drawing graphs, those big hash IDs are too painful to use, so we can use one-letter names Athrough Zinstead. Let's use this graph again but put in one-letter names:

对于绘制图形,那些大的哈希 ID 使用起来太麻烦,因此我们可以使用一个字母的名称AZ代替。让我们再次使用此图,但输入一个字母的名称:

        E--H   <-- feature/tall
       /
A--B--D---G    <-- master
    \    /
     C--F      <-- bug/short

Commit Hrefers back to commit E(Eis H's parent). Commit G, which is a merge commit—meaning it has at least two parents—refers back to both Dand F, and so on.

CommitH指回提交EEisHparent)。Commit G,这是一个合并提交——意味着它至少有两个父级——指的是两个DF,等等。

Note that the branch names, feature/tall, master, and bug/short, each point to one single commit. The name bug/shortpoints to commit F. This is why commit Fis on branch bug/short... but so is commit C. Commit Cis on bug/shortbecause it is reachablefrom the name. The name gets us to F, and Fgets us to C, so Cis on branch bug/short.

请注意,分支名称feature/tallmasterbug/short,每个都指向一个提交。该名称bug/short指向 commit F。这就是为什么 commitF在分支上bug/short......但commit也是如此C。CommitC处于开启状态,bug/short因为它可以通过名称访问。这个名字让我们去FF让我们去C,所以C在分支上bug/short

Note, however, that commit G, the tip of master, gets us to commit F. This means that commit Fis alsoon branch master. This is a key concept in Git: commits may be on one, many, or even nobranches.A branch name is merely a way to get started within a commit graph. There are other ways, such as tag names, refs/stash(which gets you to the current stash: each stash is actually a couple of commits), and the reflogs (which are normally hidden from view as they are normally just clutter).

但是请注意,commitG的提示master,让我们提交F。这意味着,提交F分支master这是 Git 中的一个关键概念:提交可能在一个多个甚至没有分支上。分支名称只是在提交图中开始的一种方式。还有其他方法,例如标签名称refs/stash(它可以让您进入当前存储:每个存储实际上是几次提交)和引用日志(通常隐藏起来,因为它们通常只是杂乱无章)。

This also, however, gets us to grafts and replacements. A graft is just a limited kind of replacement, and shallowrepositories use a limited form of graft.1I won't describe replacements fully here as they are a bit more complicated, but in general, what Git does for all of these is to use the graft or replacement as an "instead-of". For the specific case of commits, what we want here is to be able to change—or at least, pretendto change—the parent ID or IDs of any commit ... and for shallowrepositories, we want to be able to pretend that the commit in question has noparents.

然而,这也使我们能够进行移植和替换。移植只是一种有限的替换,存储库使用有限形式的移植。1我不会在这里完整描述替换,因为它们有点复杂,但总的来说,Git 对所有这些所做的就是使用移植或替换作为“替代”。对于commits的特定情况,我们在这里想要的是能够改变——或者至少,假装改变——任何提交的父 ID 或 ID ......而对于存储库,我们希望能够假装有问题的提交没有父母。



1The way shallow repositories use the graft code is notshaky. For the more general case, I recommended using git replaceinstead, as that also was and is notshaky. The only recommended use for grafts is—or at least was, years ago—to put them in place just long enough to run git filter-branchto copyan altered—grafted—history, after which you should just discard the grafted history entirely. You can use git replacefor this purpose as well, but unlike grafts, you can use git replacepermanently or semi-permanently, withoutneeding git filter-branch.

1浅层存储库使用移植代码的方式并没有动摇。对于更一般的情况,我建议git replace改为使用,因为这也曾经并且不会不稳定。唯一推荐的移植用途是——或者至少在几年前是——将它们放置到位足够长的时间git filter-branch复制改变的——嫁接的——历史,之后你应该完全丢弃嫁接的历史。您可以使用git replace此目的为好,但不像移植,您可以使用git replace永久或半永久,需要git filter-branch



Making a shallow clone

制作浅克隆

To make a depth-1 shallow clone of the current state of the upstream repository, we will pick one of the three branch names—feature/tall, master, or bug/short—and translate it to a commit ID. Then we will write a special graft entry that says: "When you see that commit, pretend that it has noparent commits, i.e., is a root commit."

为了使上游资源库的当前状态的深度1浅克隆,我们会挑选三个分支中的一个业者名称feature/tallmasterbug/short-和它转化为一个提交ID。然后我们会写一个特殊的移植条目,上面写着:“当你看到那个提交时,假装它没有父提交,即是一个根提交。”

Let's say we pick master. The name masterpoints to commit G, so to make a shallowclone of commit G, we obtain commit Gfrom the upstream Git as usual, but then write a special graft entry that claims commit Ghas noparents. We put that into our repository, and now our graph looks like this:

假设我们选择master. 名master点提交G,所以做出一个提交的克隆G,我们得到承诺G,从上游的Git像往常一样,但随后写一个特殊的移植条目要求提交G没有父母。我们将其放入我们的存储库中,现在我们的图表如下所示:

G   <-- master, origin/master

Those parent IDs are still actually inside G; it's just that every time we have Git use or show us the history, it immediately "grafts" nothing-at-all on, so that Gseems to bea root commit, for history tracking purposes.

那些父 ID 实际上还在里面G;只是每次我们使用 Git 或向我们展示历史时,它都会立即“移植”任何东西,因此这G似乎是根提交,用于历史跟踪目的。

Updating a shallow clone we made earlier

更新我们之前制作的浅层克隆

But what if we already have a (depth-1 shallow) clone, and we want to updateit? Well, that's not really a problem. Let's say we made a shallow clone of the upstream back when masterpointed to commit B, before the new branches and the bug fix. That means we currentlyhave this:

但是如果我们已经有一个(depth-1 浅)克隆,并且我们想要更新它呢?嗯,这真的不是问题。假设在新分支和错误修复之前,当master指向 commit 时B,我们对上游进行了浅层克隆。这意味着我们目前有这个:

B   <-- master, origin/master

While B's real parent is A, we have a shallow-clone graft entry saying "pretend Bis a root commit". Now we git fetch --depth 1, which looks up the upstream's master—the thing wecall origin/master—and sees commit G. We grab commit Gfrom the upstream, along with its objects, but deliberately don'tgrab commits Dand F. We then update our shallow-clone graft entries to say "pretend Gis a root commit too":

虽然其B真正的父项是A,但我们有一个浅克隆移植条目,上面写着“假装B是根提交”。现在,我们git fetch --depth 1查找上游的master——我们称之为的东西origin/master——并看到 commit G。我们G从上游获取提交及其对象,但故意获取提交DF. 然后我们更新我们的浅克隆移植条目以说“假装G也是根提交”:

B   <-- master

G   <-- origin/master

Our repository now has tworoot commits: The name master(still) points to commit B, whose parents we (still) pretend are non-existent, and the name origin/masterpoints to G, whose parents we pretend are non-existent.

我们的存储库现在有两个根提交:名称master(仍然)指向 commit B,我们(仍然)假装其父母不存在,而名称origin/master指向G,我们假装其父母不存在。

This is why you need git reset

这就是为什么你需要 git reset

In a normal repository, you might use git pull, which really is git fetchfollowed by git merge. But git mergerequires history, and we have none: we have faked Git out with pretend root commits, and they have no history behind them. So we must use git resetinstead.

在普通存储库中,您可能会使用git pullgit fetch后面跟着git merge. 但是git merge需要历史,而我们没有:我们通过假装 root 提交来伪造 Git,而它们背后没有任何历史。所以我们必须git reset改用。

What git resetdoes is a bit complicated, because it can affect up to three different things: a branch name, the index, and the work-tree. We have already seen what the branch names are: they simply point to a (one, specific) commit, which we call the tipof the branch. That leaves the index and work-tree.

什么git reset是有点复杂,因为它最多可以影响三个不同的东西:分支名称索引工作树。我们已经看到了分支名称是什么:它们只是指向一个(一个,特定的)提交,我们称之为分支的尖端。这留下了索引和工作树。

The work-treeis easy to explain: it's where all your files are. That's it: no more and no less. It's there so that you can actually useGit: Git is all about storing every commit ever made, forever, so that they can all be retrieved. But they're in a format useless to mere mortals. To be used, a file—or more typically, a whole commit's worth of files—has to be extracted into its normal format. The work-tree is where that happens, and then you can work on it and make new commits using it too.

工作树很容易解释:这就是所有文件。就是这样:不多也不少。它在那里,以便您可以实际使用Git:Git 就是永远存储所有提交的内容,以便可以检索它们。但它们的格式对凡人毫无用处。要使用,一个文件——或者更典型的是,整个提交的文件价值——必须被提取为它的正常格式。工作树就是发生这种情况的地方,然后您可以对其进行处理并使用它进行新的提交。

The indexis a bit harder to explain. It's something peculiar to Git: other version control systems don't have one, or if they have something like it, they don't expose it. Git does. Git's index is essentially where you keep the nextcommit to make, but that means that it starts out holding the currentcommit that you have extracted into the work-tree, and Git uses that to make Git fast. We'll say more about this in a bit.

指数是有点难以解释。这是 Git 特有的东西:其他版本控制系统没有,或者如果他们有类似的东西,他们不会公开它。Git 可以。Git 的索引本质上是你保存下一次提交的地方,但这意味着它开始保存你提取到工作树中的当前提交,Git 使用它来使 Git 快速。我们稍后会详细介绍这一点。

What git reset --harddoes is to affect all three: branch name, index, and work-tree. It movesthe branch name so that it points to a (probably different) commit. Then it updates the index to match that commit, and updates the work-tree to match the new index.

什么git reset --hard是影响所有三个:分支名称、索引和工作树。它移动分支名称,使其指向(可能不同的)提交。然后它更新索引以匹配该提交,并更新工作树以匹配新索引。

Hence:

因此:

git reset --hard origin/master

tells Git to look up origin/master. Since we ran our git fetch, that now points to commit G. Git then makes ourmaster—our current (and only) branch—also point to commit G, and then updates our index and work-tree. Our graph now looks like this:

告诉 Git 查找origin/master. 由于我们运行了我们的git fetch, 现在指向 commit G。Git 然后让我们的主分支——我们当前的(也是唯一的)分支——也指向 commit G,然后更新我们的索引和工作树。我们的图表现在看起来像这样:

B   [abandoned - but see below]

G   <-- master, origin/master

Now masterand origin/masterboth name commit G, and commit Gis the one checked-out into the work-tree.

现在masterorigin/master两者都命名为 commit G,而 commitG是检出到工作树中的那个。

Why you need git clean -dfx

为什么你需要 git clean -dfx

The answer here is a bit complicated, but usually it's "you don't" (need to git clean).

这里的答案有点复杂,但通常是“你不需要”(需要git clean)。

When you doneed git clean, it is because you—or something you ran—added files to your work-tree that you have not told Git about. These are untrackedand/or ignoredfiles. Using git clean -dfwill remove untrackedfiles (and empty directories); adding -xwill also remove the ignored files.

当您确实需要 时git clean,那是因为您(或您运行的某些东西)向您的工作树添加了您没有告诉 Git 的文件。这些是未跟踪和/或忽略的文件。使用git clean -df将删除未跟踪的文件(和空目录);添加-x还将删除被忽略的文件。

For more about the difference between "untracked" and "ignored", see this answer.

有关“未跟踪”和“忽略”之间区别的更多信息,请参阅此答案

Why you don't need git clean: the index

为什么不需要git clean:索引

I mentioned above that you usually don't need to run git clean. This is because of the index. As I said earlier, Git's index is mainly "the next commit to make". If you never add your own files—if you are just using git checkoutto check out various existing commits that you have had all along, or that you have added with git fetch; or if you are using git reset --hardto move a branch name and also switch the index and work-tree to another commit—then whatever is in the index right nowis there becausean earlier git checkout(or git reset) putit in the index, and also into the work-tree.

我在上面提到你通常不需要运行git clean. 这是因为索引。正如我之前所说,Git 的索引主要是“下一次提交”。如果你从不添加自己的文件——如果你只是git checkout用来检查你一直拥有的各种现有提交,或者你已经添加了git fetch; 或者,如果您正在使用git reset --hard移动分支名称并将索引和工作树切换到另一个提交 - 那么现在索引中的任何内容都在那里,因为之前git checkout(或git reset它放入索引中,并且也放入了工作中-树。

In other words, the index has a short—and fast for Git to access—summaryor manifestdescribing the current work-tree. Git uses that to know what is in the work-tree now. When you ask Git to switch to another commit, via git checkoutor git reset --hard, Git can quickly compare the existing index to the new commit. Any files that have changed, Git must extract from the new commit (and update the index). Any files that are newly added, Git must also extract (and update the index). Any files that are gone—that are in the existing index, but not in the new commit—Git must remove... and that's what Git does. Git updates, adds, and removes those files in the work-tree, as directed by the comparison between the current index, and the new commit.

换句话说,索引有一个简短的——而且 Git 可以快速访问——摘要清单,描述当前的工作树。Git 使用它来了解现在工作树中的内容。当您要求 Git 切换到另一个提交时,通过git checkoutgit reset --hard,Git 可以快速将现有索引与新提交进行比较。任何已更改的文件,Git 必须从新提交中提取(并更新索引)。任何新添加的文件,Git 还必须提取(并更新索引)。任何消失的文件——在现有索引中,但不在新提交中——Git 必须删除... 这就是 Git 所做的。Git 根据当前索引和新提交之间的比较,更新、添加和删除工作树中的这些文件。

What this means is that if you doneed git clean, you must have done something outside Git that added files. These added files are not in the index, so by definition, they are untracked and/or ignored. If they are merely untracked, git clean -fwould remove them, but if they are ignored, only git clean -fxwill remove them. (You want -djust to remove directories that are or become empty during the cleaning.)

这意味着如果你确实需要git clean,你必须在 Git 之外做了一些添加文件的事情。这些添加的文件不在索引中,因此根据定义,它们不会被跟踪和/或忽略。如果它们只是未被跟踪,git clean -f则会删除它们,但如果它们被忽略,则只会git clean -fx删除它们。(您只想-d删除在清理过程中为空或变空的目录。)

Abandoned commits and garbage collection

放弃的提交和垃圾收集

I mentioned, and drew in the updated shallow graph, that when we git fetch --depth 1and then git reset --hard, we wind up abandoningthe previous depth-1 shallow graph commit. (In the graph I drew, this was commit B.) However, in Git, abandoned commits are rarely truly abandoned—at least, not right away. Instead, some special names like ORIG_HEADhang on to them for a while, and each reference—branches and tags are forms of reference—carries with it a logof "previous values".

我提到并绘制了更新的浅图,当我们git fetch --depth 1和 then 结束时git reset --hard,我们最终放弃了之前的深度 1 浅图提交。(在我绘制的图表中,这是 commit B。)然而,在 Git 中,放弃的提交很少真正被放弃——至少,不是立即放弃。取而代之的是,一些特殊的名称,比如ORIG_HEAD在它们上面挂一段时间,每个引用——分支和标签都是引用的形式——带有一个“以前的值”的日志

You can display each reflog with git reflog refname. For instance, git reflog mastershows you not only which commit masternames now, but also which commits it has named in the past. There is also a reflog for HEADitself, which is what git reflogshows by default.

您可以使用. 例如,不仅显示了哪些提交名称now,而且显示了它在过去命名的提交。还有一个自己的 reflog ,这是默认显示的。git reflog refnamegit reflog mastermasterHEADgit reflog

Reflog entries eventually expire. Their exact duration varies, but by default they are eligible for expiration after 30 days in some cases and 90 days in others. Once they do expire, those reflog entries no longer protect abandoned commits (or, for annotated tag references, the annotated tag object—tags are not supposedto move, so this case is not supposedto occur, but if it does—if you force Git to move a tag—it's just handled in the same way as all other references).

Reflog 条目最终会过期。它们的确切持续时间各不相同,但默认情况下,它们在某些情况下可以在 30 天后到期,在其他情况下可以在 90 天后到期。一旦它们过期,这些 reflog 条目就不再保护放弃的提交(或者,对于带注释的标签引用,带注释的标签对象——标签不应该移动,所以这种情况不应该发生,但如果它发生了——如果你强制Git 移动标签——它只是以与所有其他引用相同的方式处理)。

Once any Git object—commit, annotated tag, "tree", or "blob" (file)—is reallyunreferenced, Git is allowed to remove it for real.2It's only at this point that the underlying repository data for the commits and files goes away. Even then, it only happens when something runs git gc. Thus, a shallow repository updated with git fetch --depth 1is not quitethe same as a fresh clone with --depth 1: the shallow repository probably has some lingering names for the original commits, and won't remove the extra repository objects until those names expire or are otherwise cleared-out.

一旦任何 Git 对象——提交、带注释的标签、“树”或“blob”(文件)——真的没有被引用,Git 就可以真正删除它。2仅在此时,提交和文件的底层存储库数据才会消失。即便如此,它也只会在某些东西运行时发生git gc。因此,一个浅浅的资源库和更新git fetch --depth 1不是一样的新鲜克隆具有--depth 1:浅库可能有原始提交一些挥之不去的名字,直到这些名字到期或以其他方式清除出不会删除多余的资源库对象.



2Besides the reference check, objects get a minimum timebefore they expire as well. The default is two weeks. This prevents git gcfrom deleting temporary objects that Git is creating, but has yet to establish a reference to. For instance, when making a new commit, Git first turns the index into a series of treeobjects which refer to each other but have no top-level reference. Then it creates a new commitobject that refers to the top-level tree, but nothing yet refers to the commit. Last, it updates the current branch name. Until that last step finishes, the trees and new commit are unreachable!

2除了引用检查,对象在过期前也有最短时间。默认为两周。这可以防止git gc删除 Git 正在创建但尚未建立引用的临时对象。例如,在进行新提交时,Git 首先将索引转换为一系列tree相互引用但没有顶级引用的对象。然后它创建一个新commit对象,该对象引用顶级树,但还没有任何内容引用提交。最后,它更新当前分支名称。在最后一步完成之前,树和新提交是无法访问的!



Special considerations for --single-branchand/or shallow clones

--single-branch和/或浅克隆的特殊考虑

I noted above that the name you give to git clone -bcan refer to a tag. For normal (non-shallow or non-single-branch) clones, this works just as one would expect: you get a regular clone, and then Git does a git checkoutby the tag name. The result is the usual detached HEAD, in a perfectly ordinary clone.

我在上面注意到你给的名字git clone -b可以指一个标签。对于普通的(非浅层或非单分支)克隆,这正如人们所期望的那样:你得到一个普通的克隆,然后 Gitgit checkout按标签名做一个。结果是通常分离的 HEAD,在一个完全普通的克隆中。

With shallow or single-branch clones, however, there are several unusual consequences. These are all, to some extent, a result of Git letting the implementation show through.

然而,对于浅的或单分支的克隆,有几个不寻常的后果。在某种程度上,这些都是 Git 让实现显示出来的结果。

First, if you use --single-branch, Git alters the normal fetchconfiguration in the new repository. The normal fetchconfiguration depends on the name you choose for the remote, but the default is originso I will just use originhere. It reads:

首先,如果您使用--single-branch,Git 会更改fetch新存储库中的正常配置。正常fetch配置取决于您为remote选择的名称,但默认是origin这样,我将在origin这里使用。它写道:

fetch = +refs/heads/*:refs/remotes/origin/*

Again, this is the normal configuration for a normal(not single-branch) clone. This configuration tells git fetchwhat to fetch, which is "all branches". When you use --single-branch, though, you get instead a fetch line that refers to only the one branch:

同样,这是正常(非单分支)克隆的正常配置。此配置告诉git fetchfetch 什么,即“所有分支”。--single-branch但是,当您使用时,您会得到一个仅引用一个分支的 fetch 行:

fetch = +refs/heads/zorg:refs/remotes/origin/zorg

if you're cloning the zorgbranch.

如果您要克隆zorg分支。

Whichever branch you clone, that's the one that goes into the fetchline.Each futuregit fetchwill obey this line,3so you won't fetch any other branches. If you dowant to fetch other branches later, you will have to alter this line, or add more lines.

无论您克隆哪个分支,都将进入该fetch行。每个未来git fetch都将遵守这条线,3所以你不会获取任何其他分支。如果以后确实想获取其他分支,则必须更改此行或添加更多行。

Second, if you use --single-branchand what you clone is a tag, Git will put in a rather odd fetchline. For instance, with git clone --single-branch -b v2.1 ...I get:

其次,如果你使用--single-branch并且你克隆的是一个 tag,Git 会放在一个相当奇怪的fetch行中。例如,git clone --single-branch -b v2.1 ...我得到:

fetch = +refs/tags/v2.1:refs/tags/v2.1

This means you will get nobranches, and unless someone has moved the tag,4git fetchwill do nothing!

这意味着你不会得到任何分支,除非有人移动了标签,否则4git fetch什么都不做!

Third, the default tag behavior is a bit weirddue to the way git cloneand git fetchobtain tags. Remember that tags are simply a reference to one particular commit, just like branches and all other references. There are two key differences between branches and tags, though: branches are expectedto move (and tags are not), and branches get renamed(and tags don't).

第三,由于获取标签的方式和方式,默认标签行为有点奇怪。请记住,标签只是对一个特定提交的引用,就像分支和所有其他引用一样。有分支机构和标签之间的两个关键区别,虽然:分支预期移动(和标签都没有),和分支机构获得改名(和标签不)。git clonegit fetch

Remember that all throughout the above, we keep finding that the other (upstream) Git's masterbecomes our origin/master, and so on. This is an example of the renaming process. We also saw, briefly, precisely howthat renaming works, through the fetch =line: our Git takes their refs/heads/masterand changes it to our refs/remotes/origin/master. This name is not only different-looking(origin/master), but literally can'tbe the same as any of our branches. If we create a branch named origin/master,5this branch's "full name" is actually refs/heads/origin/masterwhich is different from the other full name refs/remotes/origin/master. It's only when Git uses the shorter name that we have one (regular, local) branch named origin/masterand another different (remote-tracking) branch named origin/master. (It's a lot like being at a group where everyone is named Bruce.)

请记住,在上述所有内容中,我们不断发现另一个(上游)Gitmaster变成了我们的origin/master,依此类推。这是重命名过程的一个示例。我们还简要地看到了重命名如何工作的,通过这一fetch =行:我们的 Git 将他们的refs/heads/master并将其更改为我们的refs/remotes/origin/master. 这个名字不仅看起来不同(origin/master),而且字面上不能与我们的任何一个分支相同。如果我们创建一个名为 的分支origin/master5这个分支的“全名”实际上refs/heads/origin/master与另一个全名不同refs/remotes/origin/master。只有当 Git 使用较短的名称时,我们才会有一个(常规的、本地的)分支命名origin/master和另一个不同的(远程跟踪)分支名为origin/master. (这很像在一个每个人都被命名为 Bruce 的小组中。)

Tags don't go through all this. The tag v2.1is just named refs/tags/v2.1. This means there's no way to separate "their" tag from "your" tag. You can have either your tag, or their tag. As long as no one ever movesa tag, this doesn't matter: if you bothhave the tag, it must point to the same object. (If someone starts moving tags, things get ugly.)

标签不会经历这一切。标签v2.1只是命名为refs/tags/v2.1。这意味着无法将“他们的”标签与“您的”标签分开。您可以拥有自己的标签,也可以拥有他们的标签。只要没有人移动一个标签,这并不重要:如果两个有标签,它必须指向同一个对象。(如果有人开始移动标签,事情就会变得很糟糕。)

In any case, Git implements the "normal" fetching of tags by a simple rule:6when Git already has a commit, if some tag namesthat commit, Git copies the tag too.With ordinary clones, the first clone gets all the tags, and then subsequent git fetchoperations get the newtags. A shallow clone, however, by definition omits some commit(s), namely everything below any graft-point in the graph. Those commits won't pick up the tags. They can't: to have the tags, you would need to have the commits. Git is not allowed (except through the shallow grafts) to have the ID of a commit without actually having the commit.

在任何情况下,Git 都通过一个简单的规则实现了“正常”的标签获取:6当 Git 已经有一个提交时,如果某些标签名称提交,Git 也会复制该标签。对于普通克隆,第一个克隆获得所有标签,然后后续git fetch操作获得标签。然而,根据定义,浅层克隆会省略一些提交,即图中任何嫁接点以下的所有内容。这些提交不会选择标签。他们不能:要拥有标签,您需要提交。Git 不允许(除非通过浅层移植)在没有实际提交的情况下拥有提交的 ID。



3You can give git fetchsome refspec(s) on the command line, and those will override the default. This applies only to a default fetch. You may also use multiple fetch =lines in the configuration, e.g., to fetch just a specific set of branches, although the normal way to "de-restrict" an initially-single-branch clone is to put back the usual +refs/heads/*:refs/remotes/origin/*fetch line.

3您可以git fetch在命令行上提供一些 refspec(s),这些将覆盖默认值。这仅适用于默认提取。您也可以fetch =在配置中使用多行,例如,只获取一组特定的分支,尽管“解除限制”最初单分支克隆的正常方法是放回通常的+refs/heads/*:refs/remotes/origin/*获取行。

4Since tags are not supposedto move, we could just say "this does nothing". If they do move, though, the +in the refspec represents the force flag, so the tag winds up moving.

4由于标签不应该移动,我们可以说“这什么都不做”。但是,如果它们确实移动了+,则 refspec 中的 代表力标志,因此标签最终会移动。

5Don't do this. It's confusing. Gitwill handle it just fine—the local branch is in the local name space, and the remote-tracking branch is in the remote-tracking name space—but it's really confusing.

5不要这样做。这很混乱。 Git会很好地处理它——本地分支在本地命名空间中,远程跟踪分支在远程跟踪命名空间中——但它真的很混乱。

6This rule does not match the documentation. I tested against Git version 2.10.1; older Gits might use a different method.

6此规则与文档不符。我针对 Git 版本 2.10.1 进行了测试;较老的 Git 可能会使用不同的方法。

回答by VonC

On the shallow clone update process itself, see commit 649b0c3form Git 2.12 (Q1 2017).
That commit is part of:

关于浅克隆更新过程本身,请参阅Git 2.12(2017 年第一季度)提交 649b0c3
该提交是以下内容的一部分:

Commit 649b0c3, commit f2386c6, commit 6bc3d8c, commit 0afd307(06 Dec 2016) by Nguy?n Thái Ng?c Duy (pclouds). See commit 1127b3c, commit 381aa8e(06 Dec 2016) by Rasmus Villemoes (ravi-prevas). (Merged by Junio C Hamano -- gitster--in commit 3c9979b, 21 Dec 2016)

提交 649b0c3提交 f2386c6提交 6bc3d8c提交 0afd307(2016 年 12 月 6 日)由Nguy?n Thái Ng?c Duy ( pclouds)。请参阅Rasmus Villemoes ( ) 的commit 1127b3ccommit 381aa8e(2016 年 12 月 6 日(由Junio C Hamano合并-- --提交 3c9979b 中,2016 年 12 月 21 日)ravi-prevasgitster

shallow.c

This paint_down()is part of step 6 of 58babff (shallow.c: the 8 steps to select new commits for .git/shallow - 2013-12-05).
When we fetch from a shallow repository, we need to know if one of the new/updated refs needs new "shallow commits" in .git/shallow(because we don't have enough history of those refs) and which one.

The question at step 6 is, what (new) shallow commits are required in other to maintain reachability throughout the repository withoutcutting our history short?
To answer, we mark all commits reachable from existing refs with UNINTERESTING ("rev-list --not --all"), mark shallow commits with BOTTOM, then for each new/updated refs, walk through the commit graph until we either hit UNINTERESTING or BOTTOM, marking the ref on the commit as we walk.

After all the walking is done, we check the new shallow commits. If we have not seen any new ref marked on a new shallow commit, we know all new/updated refs are reachable using just our history and .git/shallow.
The shallow commit in question is not needed and can be thrown away.

So, the code.

The loop here (to walk through commits) is basically:

  1. get one commit from the queue
  2. ignore if it's SEEN or UNINTERESTING
  3. mark it
  4. go through all the parents and..
    • 5.a a mark it if it's never marked before
    • 5.b put it back in the queue

What we do in this patch is drop step 5a because it is not necessary.
The commit being marked at 5a is put back on the queue, and will be marked at step 3 at the next iteration. The only case it will not be marked is when the commit is already marked UNINTERESTING (5a does not check this), which will be ignored at step 2.

shallow.c

paint_down()58babff的第 6 步的一部分(shallow.c:为 .git/shallow 选择新提交的 8 个步骤 - 2013-12-05)
当我们从浅层存储库中获取时,我们需要知道其中一个新的/更新的 refs 是否需要新的“浅层提交” .git/shallow(因为我们没有足够的这些 refs 的历史记录)以及哪个。

第 6 步的问题是,在其他方面需要哪些(新的)浅提交来保持整个存储库的可达性而不缩短我们的历史?
为了回答,我们用 UNINTERESTING (" rev-list --not --all") 标记从现有 refs 可达的所有提交,用 BOTTOM 标记浅提交,然后对于每个新的/更新的 refs,遍历提交图,直到我们点击 UNINTERESTING 或 BOTTOM,在边走边承诺。

在所有的遍历完成后,我们检查新的浅提交。如果我们在新的浅层提交上没有看到任何新的 ref 标记,我们就知道所有新的/更新的 ref 都可以使用我们的历史记录和.git/shallow.
有问题的浅提交是不需要的,可以扔掉。

所以,代码。

这里的循环(遍历提交)基本上是:

  1. 从队列中获取一个提交
  2. 如果看到或无趣,请忽略
  3. 标记它
  4. 通过所有的父母和..
    • 5.aa 如果以前从未标记过,请标记它
    • 5.b 把它放回队列

我们在此补丁中所做的是删除步骤 5a,因为它不是必需的。
在 5a 标记的提交被放回队列中,并将在下一次迭代的第 3 步标记。唯一不会被标记的情况是提交已经被标记为 UNINTERESTING(5a 不检查这个),这将在步骤 2 中被忽略。

回答by tkruse

If the goal was to update a shallow clone without fetching the whole history (but allowing to fetch a short history), then alternative approaches using modern versions of git (>= 2.11.1) can work with:

如果目标是在不获取整个历史记录(但允许获取短历史记录)的情况下更新浅克隆,那么使用现代版本的 git (>= 2.11.1) 的替代方法可以用于:

  • --shallow-since=...to only fetch commits older than a given date
  • --shallow-exclude=...to fetch without fetching commit that are ancestors of given commit
  • --shallow-since=...只获取早于给定日期的提交
  • --shallow-exclude=...在不获取作为给定提交的祖先的提交的情况下获取