从 git 存储库中删除文件(历史)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2164581/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove file from git repository (history)
提问by Boris Churzin
(solved, see bottom of the question body)
Looking for this for a long time now, what I have till now is:
(已解决,请参阅问题正文的底部)
寻找这个很长时间了,直到现在我所拥有的是:
- http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/and
- http://progit.org/book/ch9-7.html
- http://dound.com/2009/04/git-forever-remove-files-or-folders-from-history/和
- http://progit.org/book/ch9-7.html
Pretty much the same method, but both of them leave objects in pack files... Stuck.
What I tried:
几乎相同的方法,但它们都将对象留在包文件中......卡住了。
我试过的:
git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_name'
rm -Rf .git/refs/original
rm -Rf .git/logs/
git gc
Still have files in the pack, and this is how I know it:
包中仍然有文件,这就是我所知道的:
git verify-pack -v .git/objects/pack/pack-3f8c0...bb.idx | sort -k 3 -n | tail -3
And this:
和这个:
git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch file_name" HEAD
rm -rf .git/refs/original/ && git reflog expire --all && git gc --aggressive --prune
The same...
相同...
Tried git clone
trick, it removed some of the files (~3000 of them) but the largest files are still there...
尝试了git clone
技巧,它删除了一些文件(其中约 3000 个),但最大的文件仍然存在......
I have some large legacy files in the repository, ~200M, and I really don't want them there... And I don't want to reset the repository to 0 :(
我在存储库中有一些大型遗留文件,约 200M,我真的不希望它们在那里......而且我不想将存储库重置为 0 :(
SOLUTION: This is the shortest way to get rid of the files:
解决方案:这是摆脱文件的最短方法:
- check .git/packed-refs - my problem was that I had there a
refs/remotes/origin/master
line for a remote repository, delete it, otherwise git won't remove those files - (optional)
git verify-pack -v .git/objects/pack/#{pack-name}.idx | sort -k 3 -n | tail -5
- to check for the largest files - (optional)
git rev-list --objects --all | grep a0d770a97ff0fac0be1d777b32cc67fe69eb9a98
- to check what are those files git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_names'
- to remove a file from all revisionsrm -rf .git/refs/original/
- to remove git's backupgit reflog expire --all --expire='0 days'
- to expire all the loose objectsgit fsck --full --unreachable
- to check if there are any loose objectsgit repack -A -d
- repackinggit prune
- to finally remove those objects
- 检查 .git/packed-refs - 我的问题是我有
refs/remotes/origin/master
一行远程存储库,删除它,否则 git 不会删除这些文件 - (可选)
git verify-pack -v .git/objects/pack/#{pack-name}.idx | sort -k 3 -n | tail -5
- 检查最大的文件 - (可选)
git rev-list --objects --all | grep a0d770a97ff0fac0be1d777b32cc67fe69eb9a98
- 检查这些文件是什么 git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_names'
- 从所有修订版中删除文件rm -rf .git/refs/original/
- 删除 git 的备份git reflog expire --all --expire='0 days'
- 使所有松散的物体过期git fsck --full --unreachable
- 检查是否有任何松动的物体git repack -A -d
- 重新包装git prune
- 最后删除这些对象
采纳答案by Dan Moulding
I can't say for sure without access to your repository data, but I believe there are probably one or more packed refs still referencing old commits from before you ran git filter-branch
. This would explain why git fsck --full --unreachable
doesn't call the large blob an unreachable object, even though you've expired your reflog and removed the original (unpacked) refs.
如果无法访问您的存储库数据,我不能肯定地说,但我相信可能有一个或多个打包的引用仍在引用您运行之前的旧提交git filter-branch
。这将解释为什么git fsck --full --unreachable
不将大 blob 称为无法访问的对象,即使您的 reflog 已过期并删除了原始(未打包)的 refs。
Here's what I'd do (after git filter-branch
and git gc
have been done):
这是我会做的事情(之后git filter-branch
和git gc
已经完成):
1) Make sure original refs are gone:
1)确保原始参考消失:
rm -rf .git/refs/original
rm -rf .git/refs/original
2) Expire all reflog entries:
2) 使所有 reflog 条目过期:
git reflog expire --all --expire='0 days'
git reflog expire --all --expire='0 days'
3) Check for old packed refs
3)检查旧的打包参考
This could potentially be tricky, depending on how many packed refs you have. I don't know of any Git commands that automate this, so I think you'll have to do this manually. Make a backup of .git/packed-refs
. Now edit .git/packed-refs
. Check for old refs (in particular, see if it packed any of the refs from .git/refs/original
). If you find any old ones that don't need to be there, delete them (remove the line for that ref).
这可能会很棘手,具体取决于您拥有多少个打包的 ref。我不知道有任何 Git 命令可以自动执行此操作,因此我认为您必须手动执行此操作。备份.git/packed-refs
. 现在编辑.git/packed-refs
. 检查旧参考(特别是,查看它是否包含来自 的任何参考.git/refs/original
)。如果您发现任何不需要的旧文件,请删除它们(删除该引用的行)。
After you finish cleaning up the packed-refs
file, see if git fsck
notices the unreachable objects:
清理完packed-refs
文件后,查看是否git fsck
注意到无法访问的对象:
git fsck --full --unreachable
git fsck --full --unreachable
If that worked, and git fsck
now reports your large blob as unreachable, you can move on to the next step.
如果这样做有效,并且git fsck
现在将您的大 blob 报告为无法访问,您可以继续下一步。
4) Repack your packed archive(s)
4) 重新打包您打包的档案
git repack -A -d
git repack -A -d
This will ensure that the unreachable objects get unpacked and stayunpacked.
这将确保无法访问的对象被解包并保持解包状态。
5) Prune loose (unreachable) objects
5)修剪松散(无法访问)的对象
git prune
git prune
And that should do it. Git really should have a better way to manage packed refs. Maybe there is a better way that I don't know about. In the absence of a better way, manual editing of the packed-refs
file might be the only way to go.
那应该这样做。Git 真的应该有一个更好的方法来管理打包的 refs。也许有我不知道的更好的方法。在没有更好的方法的情况下,手动编辑packed-refs
文件可能是唯一的方法。
回答by Roberto Tyley
I'd recommend using the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branch
specifically designed for rewriting files from Git history. One way in which it makes your life easier here is that it actually handles allreferences by default (all tags, branches, stuff like refs/remotes/origin/master, etc) but it's also 10-50xfaster.
我建议使用BFG Repo-Cleaner,这是一种更简单、更快的替代方案,git-filter-branch
专门用于重写 Git 历史记录中的文件。在这里让您的生活更轻松的一种方式是,它实际上默认处理所有引用(所有标签、分支、诸如 refs/remotes/origin/master 之类的东西),但它也快10-50倍。
You should carefully follow these steps here: http://rtyley.github.com/bfg-repo-cleaner/#usage- but the core bit is just this: download the BFG's jar(requires Java 6 or above) and run this command:
您应该在这里仔细按照以下步骤操作:http: //rtyley.github.com/bfg-repo-cleaner/#usage- 但核心位就是:下载BFG 的 jar(需要 Java 6 或更高版本)并运行此命令:
$ java -jar bfg.jar --delete-files file_name my-repo.git
Any file named file_name
(that isn't in your latestcommit) will be will be totally removedfrom your repository's history. You can then use git gc
to clean away the dead data:
任何命名的文件file_name
(不在您的最新提交中)都将从您的存储库历史记录中完全删除。然后您可以使用git gc
清除死数据:
$ git gc --prune=now --aggressive
The BFG is generally much simpler to use than git-filter-branch
- the options are tailored around these two common use-cases:
BFG 通常比使用简单得多git-filter-branch
- 选项是围绕这两个常见用例量身定制的:
- Removing Crazy Big Files
- Removing Passwords, Credentials& other Private data
- 删除疯狂的大文件
- 删除密码、凭据和其他私人数据
Full disclosure: I'm the author of the BFG Repo-Cleaner.
完全披露:我是 BFG Repo-Cleaner 的作者。
回答by Mike Averto
I found this to be quite helpful with regards to removing a whole folder as the above didn't really help me: https://help.github.com/articles/remove-sensitive-data.
我发现这对于删除整个文件夹非常有帮助,因为上述内容并没有真正帮助我:https: //help.github.com/articles/remove-sensitive-data。
I used:
我用了:
git filter-branch -f --force \
--index-filter 'git rm -rf --cached --ignore-unmatch folder/sub-folder' \
--prune-empty --tag-name-filter cat -- --all
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now
回答by BHMulder
I was trying to get rid of a big file in the history, and the above answers worked, up to a point. The point is: they don't work if you have tags. If the commit containing the big file is reachable from a tag, then you would need to adjust the filter-branches command thusly:
我试图摆脱历史上的一个大文件,并且上述答案在某种程度上有效。关键是:如果你有标签,它们就不起作用。如果可以从标签访问包含大文件的提交,那么您需要这样调整 filter-branches 命令:
git filter-branch --tag-name-filter cat \
--index-filter 'git rm --cached --ignore-unmatch huge_file_name' -- \
--all --tags
回答by Wayne Conrad
See: How do I remove sensitive files from git's history
The above will fail if the file does not exist in a rev. In that case, the '--ignore-unmatch' switch will fix it:
如果文件在 rev 中不存在,则上述操作将失败。在这种情况下,“--ignore-unmatch”开关将修复它:
git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch <filename>' HEAD
Then, to get all loose objects out of the repostiry:
然后,要从 repostiry 中取出所有松散的对象:
git gc --prune='0 days ago'
回答by Spain Train
This should be covered by the git obliterate
command in Git Extras (https://github.com/visionmedia/git-extras).
这应该包含git obliterate
在 Git Extras ( https://github.com/visionmedia/git-extras) 中的命令中。
git obliterate <filename>
回答by VonC
You have various reasons for a still large git repo size after git gc
, since it does not remove all loose objects.
之后仍然有很大的 git repo 大小有多种原因git gc
,因为它不会删除所有松散的对象。
I detail those reasons in "reduce the git repository size"
我在“减少 git 存储库大小”中详细说明了这些原因
But one trick to test in your case would be to cloneyour "cleaned" Git repoand see if the clone has the appropriate size.
但是在你的情况下测试的一个技巧是克隆你的“清理过的”Git repo,看看克隆是否有合适的大小。
(' "cleaned" repo ' being the one where you did apply the filter-branch
, and then gc
and prune
)
('“清理过的”repo '是您应用filter-branch
, 然后gc
和 的地方prune
)
回答by Cyril Leroux
I had the same problem and I found a great tutorialon github that explain step by step how to get rid of files you accidentally committed.
我遇到了同样的问题,我在 github 上找到了一个很棒的教程,它逐步解释了如何删除您不小心提交的文件。
Here is a little summary of the procedure as Cupcake suggested.
这是 Cupcake 建议的程序的小摘要。
If you have a file named file_to_remove
to remove from the history :
如果您有一个名为file_to_remove
要从历史记录中删除的文件:
cd path_to_parent_dir
git filter-branch --force --index-filter \
'git rm --cached --ignore-unmatch file_to_remove' \
--prune-empty --tag-name-filter cat -- --all