如何从 git 存储库中删除旧版本的媒体文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6358476/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 05:34:38  来源:igfitidea点击:

How to remove old versions of media files from a git repository

gitgit-rewrite-history

提问by Ricardo Sanchez-Saez

I have a Git repository with several huge media files (images and audio files). Several versions of these media files have been successively commited to the repo. The files are successively refined versions of the same assets, and they have the same name.

我有一个 Git 存储库,里面有几个巨大的媒体文件(图像和音频文件)。这些媒体文件的多个版本已连续提交到回购中。这些文件是相同资产的连续精炼版本,它们具有相同的名称。

I want to keep only the latest version in the Git repository, because it is becoming too big.
What is the simplest way to do this?
How can I propagate these changes correctly to the upstream repository?

我只想在 Git 存储库中保留最新版本,因为它变得太大了。
什么是最简单的方法来做到这一点?
如何将这些更改正确地传播到上游存储库?

采纳答案by Kevin Wright

I have a script (github gist here) to remove a selection of unwanted folders from the entire history of a git repo, or to delete all but the latest version of a folder.

我有一个脚本(这里的 github gist)可以从 git repo 的整个历史记录中删除一些不需要的文件夹,或者删除除最新版本之外的所有文件夹。

It's hard-coded to assume that all git repositories are in ~/repos, but that's easy to change. It should also be easy to adapt to work with individual files.

假设所有 git 存储库都在~/repos. 它也应该很容易适应处理单个文件。

回答by lac.alan

Old thread but in case someone else stumbles along here…

旧线程,但以防万一其他人在这里绊倒……

GitHub & Bitbucket both recommend using BFG Repo-Cleaner.

GitHub 和 Bitbucket 都推荐使用BFG Repo-Cleaner

See:
GitHub: Remove Sensitive Data
Bitbucket: Reduce Repository Size& Bitbucket: Maintaining a Git Repository

请参阅:
GitHub:删除敏感数据
Bitbucket:减少存储库大小Bitbucket:维护 Git 存储库

Example to remove files over 1 Megabyte, as well as jpgs, pngs and mp3s that are not in HEAD:

删除超过 1 兆字节的文件以及不在 HEAD 中的 jpg、png 和 mp3 的示例:

# First get the latest bfg.jar, then:
$ git clone --mirror git://example.com/some-big-repo.git
$ java -jar bfg.jar --strip-blobs-bigger-than 1M --delete-files '*.{jpg,png,mp3}' some-big-repo.git
$ cd some-big-repo.git
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ git push

Note: now you've pushed the updated revs, the remote repository should also run it's git gc…else you won't see the size reduction. (see e.g. https://stackoverflow.com/a/28782154/3419541)

注意:现在你已经推送了更新的版本,远程存储库也应该运行它git gc......否则你不会看到大小减少。(参见例如https://stackoverflow.com/a/28782154/3419541

Finally, re-clonethe repository to be sure that you don't accidentally re-commit the old media file blobs.

最后,重新克隆存储库以确保您不会意外地重新提交旧的媒体文件 blob。

回答by sateesh

Check the section on 'Removing Objects' in the chapter Maintenance and Data Recoveryin the ProGit book. It provides steps about how to go about removing objects from the git repo. But be warned though that it is destructive.

检查ProGit 书中维护和数据恢复一章中关于“删除对象”的部分。它提供了有关如何从 git 存储库中删除对象的步骤。但请注意,它具有破坏性。

回答by sml

As mentioned already, you will be re-writing history here, so you will have to get collaborators (if any) to do git rebase.

如前所述,您将在这里重写历史,因此您必须让合作者(如果有)来做git rebase

As for stripping a particular file from history, Github has a nice walkthrough.

至于从历史中剥离特定文件,Github 有一个很好的演练

For a solution going forward, you should look at putting the binary files in a sub-module.

对于未来的解决方案,您应该考虑将二进制文件放在子模块中。

Git's submodule support allows a repository to contain, as a subdirectory, a checkout of an external project. Submodules maintain their own identity; the submodule support just stores the submodule repository location and commit ID, so other developers who clone the containing project ("superproject") can easily clone all the submodules at the same revision. Partial checkouts of the superproject are possible: you can tell Git to clone none, some or all of the submodules.

Git 的子模块支持允许存储库作为子目录包含外部项目的检出。子模块维护自己的身份;子模块支持仅存储子模块存储库位置和提交 ID,因此克隆包含项目(“超级项目”)的其他开发人员可以轻松克隆同一修订版的所有子模块。超级项目的部分检出是可能的:您可以告诉 Git 不克隆任何、部分或全部子模块。

https://git-scm.com/docs/git-submodule

https://git-scm.com/docs/git-submodule

https://git-scm.com/book/en/v2/Git-Tools-Submodules

https://git-scm.com/book/en/v2/Git-Tools-Submodules

回答by Aasmund Eldhuset

As far as I know, this can't be done, because in git, every commit depends on the contentsof the entirehistory up to that point. So the only way to get rid of the old, big files would be to "replay" the entire commit history (preferrably with the same commit timestamps and authors), omitting the big files. Note that this will produce an entirely separate commit history.

据我所知,这是无法做到的,因为在 git 中,每次提交都取决于到那时为止的整个历史记录的内容。因此,摆脱旧的大文件的唯一方法是“重放”整个提交历史(最好使用相同的提交时间戳和作者),省略大文件。请注意,这将产生一个完全独立的提交历史。

This is obviously not a very viable approach, so the lesson is probably "don't use git to version huge binary files". Instead, you could perhaps have a separate (ignored) folder for the files and use a separate system to version control them.

这显然不是一个非常可行的方法,所以教训可能是“不要使用 git 来版本巨大的二进制文件”。相反,您或许可以为这些文件创建一个单独的(忽略的)文件夹,并使用单独的系统来对它们进行版本控制。