从 git/GitHub 的历史记录中删除文件夹及其内容
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10067848/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove folder and its contents from git/GitHub's history
提问by Kartik
I was working on a repository on my GitHub account and this is a problem I stumbled upon.
我正在我的 GitHub 帐户上处理一个存储库,这是我偶然发现的一个问题。
- Node.js project with a folder with a few npm packages installed
- The packages were in
node_modules
folder - Added that folder to git repository and pushed the code to github (wasn't thinking about the npm part at that time)
- Realized that you don't really need that folder to be a part of the code
- Deleted that folder, pushed it
- 带有安装了几个 npm 包的文件夹的 Node.js 项目
- 包裹在
node_modules
文件夹中 - 将该文件夹添加到 git 存储库并将代码推送到 github(当时没有考虑 npm 部分)
- 意识到您并不需要该文件夹成为代码的一部分
- 删除该文件夹,推送它
At that instance, the size of the total git repo was around 6MBwhere the actual code (all except that folder) was only around 300 KB.
在那种情况下,总 git 存储库的大小约为6MB,而实际代码(除该文件夹外的所有代码)仅为300 KB左右。
Now what I am looking for in the end is a way to get rid of details of that package folder from git's history so if someone clones it, they don't have to download 6mb worth of history where the only actual files they will be getting as of the last commit would be 300KB.
现在我最终要寻找的是一种从 git 的历史记录中删除该包文件夹的详细信息的方法,因此如果有人克隆它,他们不必下载价值 6mb 的历史记录,他们将获得唯一的实际文件截至最后一次提交将是 300KB。
I looked up possible solutions for this and tried these 2 methods
我为此查找了可能的解决方案并尝试了这两种方法
- Remove file from git repository (history)
- http://help.github.com/remove-sensitive-data/
- https://gist.github.com/1588371
The Gist seemed like it worked where after running the script, it showed that it got rid of that folder and after that it showed that 50 different commits were modified. But it didn't let me push that code. When I tried to push it, it said Branch up to date
but showed 50 commits were modified upon a git status
. The other 2 methods didn't help either.
Gist 似乎在运行脚本后可以工作,它表明它摆脱了该文件夹,然后表明修改了 50 个不同的提交。但它并没有让我推送那个代码。当我尝试推送它时,它说Branch up to date
但显示 50 个提交被修改了git status
. 其他两种方法也没有帮助。
Now even though it showed that it got rid of that folder's history, when I checked the size of that repo on my localhost, it was still around 6MB. (I also deleted the refs/original
folder but didn't see the change in the size of the repo).
现在,即使它显示它删除了该文件夹的历史记录,当我在我的本地主机上检查该存储库的大小时,它仍然在 6MB 左右。(我也删除了该refs/original
文件夹,但没有看到 repo 大小的变化)。
What I am looking to clarify is, if there's a way to get rid of not only the commit history (which is the only thing I think happened) but also those files git is keeping assuming one wants to rollback.
我想澄清的是,如果有一种方法不仅可以摆脱提交历史记录(这是我认为唯一发生的事情),而且还可以摆脱 git 一直假设要回滚的那些文件。
Lets say a solution is presented for this and is applied on my localhost but cant be reproduced to that GitHub repo, is it possible to clone that repo, rollback to the first commit perform the trick and push it (or does that mean that git will still have a history of all those commits? - aka. 6MB).
假设为此提出了一个解决方案并应用于我的本地主机,但不能复制到该 GitHub 存储库,是否可以克隆该存储库,回滚到第一次提交执行该技巧并推送它(或者这是否意味着 git 将仍然有所有这些提交的历史记录? - 也就是 6MB)。
My end goal here is to basically find the best way to get rid of the folder contents from git so that a user doesn't have to download 6MB worth of stuff and still possibly have the other commits that never touched the modules folder (that's pretty much all of them) in git's history.
我的最终目标是基本上找到从 git 中删除文件夹内容的最佳方法,这样用户就不必下载价值 6MB 的东西,并且仍然可能有其他从未触及模块文件夹的提交(这很漂亮大部分)在 git 的历史中。
How can I do this?
我怎样才能做到这一点?
回答by Mohsen
If you are here to copy-paste code:
如果您在这里复制粘贴代码:
This is an example which removes node_modules
from history
这是一个node_modules
从历史中删除的例子
git filter-branch --tree-filter "rm -rf node_modules" --prune-empty HEAD
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
echo node_modules/ >> .gitignore
git add .gitignore
git commit -m 'Removing node_modules from git history'
git gc
git push origin master --force
What git actually does:
git 实际上做了什么:
The first line iterates through all references on the same tree (--tree-filter
) as HEAD (your current branch), running the command rm -rf node_modules
. This command deletes the node_modules folder (-r
, without -r
, rm
won't delete folders), with no prompt given to the user (-f
). The added --prune-empty
deletes useless (not changing anything) commits recursively.
第一行遍历--tree-filter
与 HEAD(您当前的分支)相同的树 ( )上的所有引用,运行命令rm -rf node_modules
. 此命令会删除 node_modules 文件夹(-r
,没有-r
,rm
不会删除文件夹),不会向用户提示 ( -f
)。添加的--prune-empty
删除无用(不更改任何内容)递归提交。
The second line deletes the reference to that old branch.
第二行删除对该旧分支的引用。
The rest of the commands are relatively straightforward.
其余的命令相对简单。
回答by Lee Netherton
I find that the --tree-filter
option used in other answers can be very slow, especially on larger repositories with lots of commits.
我发现--tree-filter
其他答案中使用的选项可能非常慢,尤其是在具有大量提交的较大存储库中。
Here is the method I use to completely remove a directory from the git history using the --index-filter
option, which runs much quicker:
这是我使用该--index-filter
选项从 git 历史记录中完全删除目录的方法,它运行得更快:
# Make a fresh clone of YOUR_REPO
git clone YOUR_REPO
cd YOUR_REPO
# Create tracking branches of all branches
for remote in `git branch -r | grep -v /HEAD`; do git checkout --track $remote ; done
# Remove DIRECTORY_NAME from all commits, then remove the refs to the old commits
# (repeat these two commands for as many directories that you want to remove)
git filter-branch --index-filter 'git rm -rf --cached --ignore-unmatch DIRECTORY_NAME/' --prune-empty --tag-name-filter cat -- --all
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
# Ensure all old refs are fully removed
rm -Rf .git/logs .git/refs/original
# Perform a garbage collection to remove commits with no refs
git gc --prune=all --aggressive
# Force push all branches to overwrite their history
# (use with caution!)
git push origin --all --force
git push origin --tags --force
You can check the size of the repository before and after the gc
with:
您可以在gc
使用之前和之后检查存储库的大小:
git count-objects -vH
回答by participant
In addition to the popular answer aboveI would like to add a few notes for Windows-systems. The command
除了上面流行的答案之外,我还想为Windows 系统添加一些注释。命令
git filter-branch --tree-filter 'rm -rf node_modules' --prune-empty HEAD
works perfectly without anymodification! Therefore, you must notuse
Remove-Item
,del
or anything else instead ofrm -rf
.If you need to specify a path to a file or directory use slasheslike
./path/to/node_modules
无需任何修改即可完美运行!因此,您不得使用
Remove-Item
,del
或其他任何东西来代替rm -rf
。如果您需要指定文件或目录的路径,请使用斜杠,例如
./path/to/node_modules
回答by Kim T
The best and most accurate method I found was to download the bfg.jar file: https://rtyley.github.io/bfg-repo-cleaner/
我发现的最好和最准确的方法是下载 bfg.jar 文件:https://rtyley.github.io/bfg-repo-cleaner/
Then run the commands:
然后运行命令:
git clone --bare https://project/repository project-repository
cd project-repository
java -jar bfg.jar --delete-folders DIRECTORY_NAME # i.e. 'node_modules' in other examples
git reflog expire --expire=now --all && git gc --prune=now --aggressive
git push --mirror https://project/new-repository
If you want to delete files then use the delete-files option instead:
如果要删除文件,请改用 delete-files 选项:
java -jar bfg.jar --delete-files *.pyc
回答by jgbarah
Complete copy&paste recipe, just adding the commands in the comments (for the copy-paste solution), after testing them:
完整的复制粘贴配方,只需在评论中添加命令(对于复制粘贴解决方案),在测试它们之后:
git filter-branch --tree-filter 'rm -rf node_modules' --prune-empty HEAD
echo node_modules/ >> .gitignore
git add .gitignore
git commit -m 'Removing node_modules from git history'
git gc
git push origin master --force
After this, you can remove the line "node_modules/" from .gitignore
在此之后,您可以从 .gitignore 中删除“node_modules/”行
回答by kcode
For Windows user, please note to use "
instead of '
Also added -f
to force the command if another backup is already there.
对于 Windows 用户,如果另一个备份已经存在,请注意使用"
而不是'
也添加-f
来强制命令。
git filter-branch -f --tree-filter "rm -rf FOLDERNAME" --prune-empty HEAD
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
echo FOLDERNAME/ >> .gitignore
git add .gitignore
git commit -m "Removing FOLDERNAME from git history"
git gc
git push origin master --force
回答by André Anjos
It appears that the up-to-date answer to this is to notuse filter-branch
directly (at least git itself does not recommend it anymore), and defer that work to an external tool. In particular, git-filter-repois currently recommended. The author of that tool provides argumentson why using filter-branch
directly can lead to issues.
似乎对此的最新答案是不filter-branch
直接使用(至少 git 本身不再推荐它),并将该工作推迟到外部工具。目前特别推荐git-filter-repo。该工具的作者提供了为什么filter-branch
直接使用会导致问题的论据。
Most of the multi-line scripts above to remove dir
from the history could be re-written as:
上面dir
要从历史记录中删除的大多数多行脚本都可以重写为:
git filter-repo --path dir --invert-paths
The tool is more powerful than just that, apparently. You can apply filters by author, email, refname and more (full manpage here). Furthermore, it is fast. Installation is easy - it is distributed in a variety of formats.
显然,该工具比这更强大。您可以按作者、电子邮件、参考名称等应用过滤器(此处为完整联机帮助页)。此外,它速度很快。安装很容易——它以多种格式分发。
回答by LordObi
I removed the bin and obj folders from old C# projects using git on windows. Be careful with
我在 Windows 上使用 git 从旧的 C# 项目中删除了 bin 和 obj 文件夹。小心
git filter-branch --tree-filter "rm -rf bin" --prune-empty HEAD
It destroys the integrity of the git installation by deleting the usr/bin folder in the git install folder.
它通过删除 git install 文件夹中的 usr/bin 文件夹破坏了 git 安装的完整性。