折叠 git 存储库的历史记录
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/250238/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Collapsing a git repository's history
提问by Gareth
We have a git project which has quite a big history.
我们有一个历史悠久的 git 项目。
Specifically, early in the project there were quite a lot of binary resource files in the project, these have now been removed as they're effectively external resources.
具体来说,在项目早期,项目中有相当多的二进制资源文件,这些现在已被删除,因为它们是有效的外部资源。
However, the size of our repository is >200MB (the total checkout is currently ~20MB) due to having these files previously committed.
但是,由于之前提交了这些文件,我们的存储库的大小大于 200MB(目前总检出约 20MB)。
What we'd like to do is "collapse" the history so that the repository appears to have been created from a later revision than it was. For example
我们想要做的是“折叠”历史记录,以便存储库看起来是从比它更晚的修订版创建的。例如
1-----2-----3-----4-----+---+---+
\ /
+-----+---+---+
- Repository created
- Large set of binary files added
- Large set of binary files removed
- New intended 'start' of repository
- 存储库已创建
- 添加了大量二进制文件
- 删除了大量二进制文件
- 存储库的新预期“开始”
So effectively we want to lose the project history before a certain point. At this point there is only one branch, so there's no complication with trying to deal with multiple start points etc. However we don't want to lose all of the history and start a new repository with the current version.
因此,我们希望在某个时间点之前丢失项目历史记录。此时只有一个分支,因此尝试处理多个起点等并不复杂。但是我们不想丢失所有历史记录并使用当前版本启动一个新存储库。
Is this possible, or are we doomed to have a bloated repository forever?
这是可能的,还是我们注定永远拥有一个臃肿的存储库?
回答by Paul
You can remove the binary bloat and keep the rest of your history. Git allows you to reorder and 'squash' prior commits, so you can combine just the commits that add and remove your big binary files. If the adds were all done in one commit and the removals in another, this will be much easier than dealing with each file.
您可以删除二进制膨胀并保留其余的历史记录。Git 允许您重新排序和“压缩”先前的提交,因此您可以仅组合添加和删除大二进制文件的提交。如果所有添加都在一次提交中完成,而在另一个提交中完成删除,这将比处理每个文件容易得多。
$ git log --stat # list all commits and commit messages
Search this for the commits that add and delete your binary files and note their SHA1s, say 2bcdef
and 3cdef3
.
在此搜索添加和删除二进制文件的提交,并记下它们的 SHA1,例如2bcdef
和3cdef3
。
Then to edit the repo's history, use rebase -i
command with its interactive option, starting with the parent of the commit where you added your binaries. It will launch your $EDITOR and you'll see a list of commits starting with 2bcdef
:
然后要编辑存储库的历史记录,请使用rebase -i
带有交互选项的命令,从添加二进制文件的提交的父级开始。它将启动您的 $EDITOR,您将看到以以下开头的提交列表2bcdef
:
$ git rebase -i 2bcdef^ # generate a pick list of all commits starting with 2bcdef
# Rebasing zzzzzz onto yyyyyyy
#
# Commands:
# pick = use commit
# edit = use commit, but stop for amending
# squash = use commit, but meld into previous commit
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
pick 2bcdef Add binary files and other edits
pick xxxxxx Another change
.
.
pick 3cdef3 Remove binary files; link to them as external resources
.
.
Insert squash 3cdef3
as the second line and remove the line which says pick 3cdef3
from the list. You now have a list of actions for the interactive rebase
which will combine the commits which add and delete your binaries into one commit whose diff is just any other changes in those commits. Then it will reapply all of the subsequent commits in order, when you tell it to complete:
插入 squash 3cdef3
为第二行并pick 3cdef3
从列表中删除该行。您现在有一个交互式操作列表,rebase
它将添加和删除二进制文件的提交合并到一个提交中,其差异只是这些提交中的任何其他更改。然后,当您告诉它完成时,它将按顺序重新应用所有后续提交:
$ git rebase --continue
This will take a minute or two.
You now have a repo that no longer has the binaries coming or going. But they will still take up space because, by default, Git keeps changes around for 30 days before they can be garbage-collected, so that you can change your mind.
If you want to remove them now:
这将需要一两分钟。
您现在拥有一个不再有二进制文件来来去去的存储库。但是它们仍然会占用空间,因为默认情况下,Git 会将更改保留 30 天,然后它们才能被垃圾收集,以便您可以改变主意。如果您现在想删除它们:
$ git reflog expire --expire=1.minute refs/heads/master
#all deletions up to 1 minute ago available to be garbage-collected
$ git fsck --unreachable # lists all the blobs(files) that will be garbage-collected
$ git prune
$ git gc
Now you've removed the bloat but kept the rest of your history.
现在您已经消除了膨胀,但保留了其余的历史记录。
回答by davitenio
You can use git filter-branch
with grafts to make the commit number 4 the new root commit of your branch. Just create the file .git/info/grafts
with just one line in it containing the SHA1 of commit number 4.
您可以使用git filter-branch
嫁接使提交编号 4 成为分支的新根提交。只需创建.git/info/grafts
其中只有一行包含提交号 4 的 SHA1的文件。
If you now do a git log
or gitk
you will see that those commands will display commit number 4 as the root of your branch. But nothing will have actually changed in your repository. You can delete .git/info/grafts
and the output of git log
or gitk
will be as before. To actually make commit number 4 the new root you will have to run git filter-branch
, with no arguments.
如果您现在执行 agit log
或gitk
您将看到这些命令将显示提交编号 4 作为您的分支的根。但是在您的存储库中实际上没有任何变化。您可以删除.git/info/grafts
和输出git log
或gitk
将和以前一样。要真正使提交编号 4 成为新的根,您必须git filter-branch
不带参数运行。
回答by Pat Notz
Thanks to JesperE's post I looked into git-filter-branch
-- that may actually be what you want. It looks like you could retain your earlier commits too except they would be modified since your Big Files were removed. From the git-filter-branch man page:
感谢 JesperE 的帖子,我研究了git-filter-branch
——这实际上可能是你想要的。看起来你也可以保留你之前的提交,除非它们会因为你的大文件被删除而被修改。从git-filter-branch 手册页:
Suppose you want to remove a file (containing confidential information or copyright violation) from all commits:
git filter-branch --tree-filter 'rm filename' HEAD
假设您想从所有提交中删除一个文件(包含机密信息或侵犯版权):
git filter-branch --tree-filter 'rm 文件名' HEAD
Be sure to read that man page... obviously you'd want to do this on a spare clone of your repository to make sure it works as expected.
请务必阅读该手册页...显然您希望在存储库的备用克隆上执行此操作,以确保它按预期工作。
回答by JesperE
Is git-fast-export
what you are looking for?
是git-fast-export
您要找的吗?
NAME
git-fast-export - Git data exporter
SYNOPSIS
git-fast-export [options] | git-fast-import
DESCRIPTION
This program dumps the given revisions in a form suitable to be piped into git-fast-
import(1).
You can use it as a human readable bundle replacement (see git-bundle(1)), or as a kind
of an interactive git-filter-branch(1).