使用重写的 Git 存储库历史更新开发团队，删除大文件

Question

提问by rlkw1024

I have a git repo with some very large binaries in it. I no longer need them, and I don't care about being able to checkout the files from earlier commits. So, to reduce the repo size, I want to delete the binaries from the history altogether.

我有一个 git 仓库，里面有一些非常大的二进制文件。我不再需要它们，而且我不在乎能否从早期提交中检出文件。因此，为了减少 repo 大小，我想从历史记录中完全删除二进制文件。

After a web search, I concluded that my best (only?) option is to use git-filter-branch:

在网络搜索之后，我得出结论，我最好的（唯一的？）选项是使用git-filter-branch：

git filter-branch --index-filter 'git rm --cached --ignore-unmatch big_1.zip big_2.zip etc.zip' HEAD

Does this seem like a good approach so far?

到目前为止，这似乎是一个好方法吗？

Assuming the answer is yes, I have another problem to contend with. The git manual has this warning:

假设答案是肯定的，我还有另一个问题要解决。该git的手册中有这样的警告：

WARNING! The rewritten history will have different object names for all the objects and will not converge with the original branch. You will not be able to easily push and distribute the rewritten branch on top of the original branch. Please do not use this command if you do not know the full implications, and avoid using it anyway, if a simple single commit would suffice to fix your problem. (See the "RECOVERING FROM UPSTREAM REBASE" section in git-rebase(1) for further information about rewriting published history.)

警告！重写后的历史将对所有对象具有不同的对象名称，并且不会与原始分支收敛。您将无法轻松地在原始分支之上推送和分发重写的分支。如果您不知道全部含义，请不要使用此命令，并且如果简单的单次提交就足以解决您的问题，请避免使用它。（有关重写已发布历史的更多信息，请参阅 git-rebase(1) 中的“从上游 REBASE 恢复”部分。）

We have a remote repo on our server. Each developer pushes to and pulls from it. Based on the warning above (and my understanding of how git-filter-branchworks), I don't think I'll be able to run git-filter-branchon my local copy and then push the changes.

我们的服务器上有一个远程仓库。每个开发人员都推入并从中拉出。根据上面的警告（以及我对git-filter-branch工作原理的理解），我认为我无法git-filter-branch在本地副本上运行然后推送更改。

So, I'm tentatively planning to go through the following steps:

所以，我暂时打算通过以下步骤：

Tell all my developers to commit, push, and stop working for a bit.
Log into the server and run the filter on the central repo.
Have everyone delete their old copies and clone again from the server.

告诉我所有的开发人员提交、推送并停止工作一段时间。
登录服务器并在中央存储库上运行过滤器。
让每个人都删除他们的旧副本并从服务器再次克隆。

Does this sound right? Is this the best solution?

这听起来对吗？这是最好的解决方案吗？

Answer 1

采纳答案by cdhowie

Yes, your solution will work. You also have another option: instead of doing this on the central repo, run the filter on your clone and then push it back with git push --force --all. This will force the server to accept the new branches from your repository. This replaces step 2 only; the other steps will be the same.

是的，您的解决方案会奏效。您还有另一种选择：不要在中央存储库上执行此操作，而是在您的克隆上运行过滤器，然后使用git push --force --all. 这将强制服务器接受来自您的存储库的新分支。这仅替换了第 2 步；其他步骤将相同。

If your developers are pretty Git-savvy, then they might not have to delete their old copies; for example, they could fetch the new remotes and rebase their topic branches as appropriate.

如果您的开发人员非常精通 Git，那么他们可能不必删除他们的旧副本；例如，他们可以获取新的遥控器并根据需要重新设置他们的主题分支。

Answer 2

回答by Roberto Tyley

Your plan is good (though it would be better to perform the filtering on a bare clone of your repository, rather than on the central server), but in preference to git-filter-branchyou should use my BFG Repo-Cleaner, a faster, simpler alternative to git-filter-branchdesigned specifically for removing large filesfrom Git repos.

您的计划很好（尽管最好在存储库的裸克隆上执行过滤，而不是在中央服务器上执行过滤），但git-filter-branch您应该优先使用我的BFG Repo-Cleaner，这是一种更快、更简单的git-filter-branch设计替代方案专门用于从 Git 存储库中删除大文件。

Download the Java jar(requires Java 6 or above) and run this command:

下载Java jar（需要 Java 6 或更高版本）并运行以下命令：

$ java -jar bfg.jar  --strip-blobs-bigger-than 1MB  my-repo.git

Any blob over 1MB in size (that isn't in your latestcommit) will be totally removedfrom your repository's history. You can then use git gcto clean away the dead data:

任何大小超过 1MB（不在您的最新提交中）的blob都将从您的存储库历史记录中完全删除。然后您可以使用git gc清除死数据：

$ git gc --prune=now --aggressive

The BFG is typically 10-50x faster than running git-filter-branchand the options are tailored around these two common use-cases:

BFG 通常比运行快 10-50 倍，git-filter-branch并且选项是围绕这两个常见用例量身定制的：

Removing Crazy Big Files
Removing Passwords, Credentials& other Private data

删除疯狂的大文件
删除密码、凭据和其他私人数据

Answer 3

回答by Ben Hymanson

If you don't make your developers re-clone it's likely that they will manage to drag the large files back in. For example, if they carefully splice onto the new history you will create and then happen to git mergefrom a local project branch that was not rebased, the parents of the merge commit will include the project branch which ultimately points at the entire history you erased with git filter-branch.

如果你不让你的开发人员重新克隆，他们很可能会设法将大文件拖回来。例如，如果他们小心地拼接到你将创建的新历史记录上，然后碰巧git merge从一个本地项目分支不重新定位，合并提交的父项将包括项目分支，该分支最终指向您使用git filter-branch.

Answer 4

回答by Jason Axelson

Your solution is not complete. You should include --tag-name-filter catas an argument to filter branch so that the tags that contain the large files are changed as well. You should also modify all refs instead of just HEAD since the commit could be in multiple branches.

您的解决方案不完整。您应该包含--tag-name-filter cat作为参数来过滤分支，以便包含大文件的标签也被更改。您还应该修改所有引用，而不仅仅是 HEAD，因为提交可能位于多个分支中。

Here is some better code:

这是一些更好的代码：

git filter-branch --index-filter 'git rm --cached --ignore-unmatch big_1.zip big_2.zip etc.zip' --tag-name-filter cat -- --all

Github has a good guide: https://help.github.com/articles/remove-sensitive-data

Github 有一个很好的指南：https: //help.github.com/articles/remove-sensitive-data

使用重写的 Git 存储库历史更新开发团队，删除大文件

提问by rlkw1024

采纳答案by cdhowie

回答by Roberto Tyley

回答by Ben Hymanson

回答by Jason Axelson

相关推荐

最近更新

标签

使用重写的 Git 存储库历史更新开发团队，删除大文件

提问by rlkw1024

采纳答案by cdhowie

回答by Roberto Tyley

回答by Ben Hymanson

回答by Jason Axelson

相关推荐

单个文件作为 Git 子模块

使用 git-cvsimport 后如何取消跟踪 CVS 文件夹？

删除 git commit 的所有文件？

为 Java 寻找合适的 Git 库

相关推荐

最近更新

标签