通过清除存储库历史记录来释放 git 可用磁盘空间

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16057391/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 08:26:12  来源:igfitidea点击:

git free disk space by clearing repository history

git

提问by ColacX

so I'm working with some friends and we are all new to git and one of them committed a large amount of external binary files that slows down the repository, and takes up a large disk-space.

所以我和一些朋友一起工作,我们都是 git 的新手,其中一个人提交了大量的外部二进制文件,这会减慢存储库的速度,并占用大量磁盘空间。

We've just started the project so there's nothing important in it really except a readme file. So what we'd like to do is to Clear the repository history to the current state.

我们刚刚开始这个项目,所以除了自述文件之外,它没有什么重要的。所以我们想要做的是清除存储库历史到当前状态。

So basicly it looks this:

Head -> A -> B -> C    total disk size 45 MB, 1 file, 300 deleted files

And we want this:

Head -> A              total disk size 1 kB, 1 file, 0 deleted files

The obvious solution would be to create a new repository and just copy the readme file into the new repository. However I'd like to learn for educational/curiosity if there's GIT command that can do this.

显而易见的解决方案是创建一个新存储库,然后将自述文件复制到新存储库中。但是,如果有可以执行此操作的 GIT 命令,我想学习教育/好奇心。

I've been experimenting with the Rebase command, but it seems like it still keeps old history and their data, which confuses me since if rebaseing doesnt prune data from the repository then you might aswell not use it.

我一直在试验 Rebase 命令,但它似乎仍然保留旧的历史记录和它们的数据,这让我感到困惑,因为如果 rebase 没有从存储库中修剪数据,那么你最好不要使用它。

I've been googling some other posts on this issue, and im suspecting that you can't do this with git. However I'd like to confirm that.

我一直在谷歌搜索关于这个问题的其他一些帖子,我怀疑你不能用 git 来做到这一点。不过我想确认一下。

And yes it's a remote directory on github

是的,它是 github 上的一个远程目录

Thanks for any help.

谢谢你的帮助。

So for my solution i chose to do:

所以对于我的解决方案,我选择这样做:

rebase using tortoisegit
squash all commits
then using git bash:
git reflog expire --all --expire-unreachable=now
git gc --aggressive --prune=now
git push origin master --force

It doesn't seem like the local repository history wants to shrink in disk size. However cloning the repository again shows the desired results and disk size. And the repository log does too.

本地存储库历史记录似乎不想缩小磁盘大小。但是,再次克隆存储库会显示所需的结果和磁盘大小。存储库日志也是如此。

Thanks for the helpful replies. Interesting Rebase seems very powerful.

感谢有帮助的回复。有趣的 Rebase 似乎非常强大。

采纳答案by Tobu

Rebasing (git rebase -i --root, if you didn't revert the bad commit just delete its line, if you did, squash the bad commit with the revert commit) or using filter-branch will clear the data from your branch's history, but won't make it disappear from the repository entirely.

变基(git rebase -i --root如果你没有恢复错误的提交就删除它的行,如果你这样做了,用恢复提交压缩错误的提交)或使用 filter-branch 将从你的分支的历史记录中清除数据,但不会成功完全从存储库中消失。

This is because, for safety and tracability reasons, git keeps a reflog (visible with git log -g) which tracks every commit you did, whether or not it's still part of the ancestry graph.

这是因为,出于安全性和可追溯性的原因,git 保留了一个 reflog(用 可见git log -g)来跟踪您所做的每个提交,无论它是否仍然是祖先图的一部分。

Cloning the filtered repo won't clone the hidden data, and you can also remove it in-place with these commands:

克隆过滤后的 repo 不会克隆隐藏的数据,您还可以使用以下命令就地删除它:

git reflog expire --all --expire-unreachable=now
git gc --aggressive --prune=now

Those commands aren't normally recommended and the unreferenced commits would expire in 30 days anyway, but since your repository is practically new you're not risking much.

通常不推荐使用这些命令,并且未引用的提交无论如何都会在 30 天后过期,但是由于您的存储库实际上是新的,因此您不会冒太大风险。

回答by Nick

You don't need to lose your history entirely. You can just rewrite it using filter-branch. This is a pretty destructive command so make a copy first. This example will go through your history removing all jarfiles.

你不需要完全失去你的历史。您可以使用filter-branch. 这是一个非常具有破坏性的命令,所以先复制一份。此示例将遍历您删除所有jar文件的历史记录。

git filter-branch --tree-filter 'git rm **/*.jar'

Adjust this to match whatever giant files were accidentally added. Note that modifying commits changes their ID so people will probably want to re-clone the repository after this, to avoid terrible conflicts. You will also need to --forcethe push back to the repository as git will complain (rightly) that the history has changed a lot.

调整它以匹配意外添加的任何巨型文件。请注意,修改提交会更改其 ID,因此人们可能希望在此之后重新克隆存储库,以避免发生可怕的冲突。您还需要--force推回存储库,因为 git 会(正确地)抱怨历史已经发生了很大变化。

Your local repo may not immediately shrink in size until it decides to do garbage collection.

在决定进行垃圾收集之前,您的本地存储库可能不会立即缩小大小。

回答by John Szakmeister

You may want to look at Squashing all Git commits into a single commit. That also references a stack overflow question--that might be called a duplicate--over here: How to squash all git commits into one?

您可能需要查看将所有 Git 提交压缩为单个提交。这也引用了一个堆栈溢出问题——这可能被称为重复——在这里:如何将所有 git 提交压缩为一个?

The solution mentioned by Wincent in the first link is about halfway down the page. A quick test locally shows that it does work as advertised. For your reference, Wincent suggests:

Wincent在第一个链接中提到的解决方案大约是页面的一半。本地的快速测试表明它确实像宣传的那样工作。供您参考,Wincent 建议:

git update-ref -d refs/heads/master
git commit -m "Initial import"

FWIW, you'll probably need to run git gc --prune=nowto clean up any unreferenced objects. And when you push up he new master, you'll need to use --force. You should probably create a backup before trying any of this out. :-)

FWIW,您可能需要运行git gc --prune=now以清理任何未引用的对象。当你推上他的新主人时,你需要使用--force. 在尝试任何这些之前,您可能应该创建一个备份。:-)