从 git 存储库中删除文件(历史)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2164581/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 07:46:04  来源:igfitidea点击:

Remove file from git repository (history)

gitversion-controlgit-rewrite-history

提问by Boris Churzin

(solved, see bottom of the question body)
Looking for this for a long time now, what I have till now is:

(已解决,请参阅问题正文的底部)
寻找这个很长时间了,直到现在我所拥有的是:

Pretty much the same method, but both of them leave objects in pack files... Stuck.
What I tried:

几乎相同的方法,但它们都将对象留在包文件中......卡住了。
我试过的:

git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_name'
rm -Rf .git/refs/original
rm -Rf .git/logs/
git gc

Still have files in the pack, and this is how I know it:

包中仍然有文件,这就是我所知道的:

git verify-pack -v .git/objects/pack/pack-3f8c0...bb.idx | sort -k 3 -n | tail -3

And this:

和这个:

git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch file_name" HEAD
rm -rf .git/refs/original/ && git reflog expire --all &&  git gc --aggressive --prune

The same...

相同...

Tried git clonetrick, it removed some of the files (~3000 of them) but the largest files are still there...

尝试了git clone技巧,它删除了一些文件(其中约 3000 个),但最大的文件仍然存在......

I have some large legacy files in the repository, ~200M, and I really don't want them there... And I don't want to reset the repository to 0 :(

我在存储库中有一些大型遗留文件,约 200M,我真的不希望它们在那里......而且我不想将存储库重置为 0 :(

SOLUTION: This is the shortest way to get rid of the files:

解决方案:这是摆脱文件的最短方法:

  1. check .git/packed-refs - my problem was that I had there a refs/remotes/origin/masterline for a remote repository, delete it, otherwise git won't remove those files
  2. (optional)git verify-pack -v .git/objects/pack/#{pack-name}.idx | sort -k 3 -n | tail -5- to check for the largest files
  3. (optional)git rev-list --objects --all | grep a0d770a97ff0fac0be1d777b32cc67fe69eb9a98- to check what are those files
  4. git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_names'- to remove a file from all revisions
  5. rm -rf .git/refs/original/- to remove git's backup
  6. git reflog expire --all --expire='0 days'- to expire all the loose objects
  7. git fsck --full --unreachable- to check if there are any loose objects
  8. git repack -A -d- repacking
  9. git prune- to finally remove those objects
  1. 检查 .git/packed-refs - 我的问题是我有refs/remotes/origin/master一行远程存储库,删除它,否则 git 不会删除这些文件
  2. (可选)git verify-pack -v .git/objects/pack/#{pack-name}.idx | sort -k 3 -n | tail -5- 检查最大的文件
  3. (可选)git rev-list --objects --all | grep a0d770a97ff0fac0be1d777b32cc67fe69eb9a98- 检查这些文件是什么
  4. git filter-branch --index-filter 'git rm --cached --ignore-unmatch file_names'- 从所有修订版中删除文件
  5. rm -rf .git/refs/original/- 删除 git 的备份
  6. git reflog expire --all --expire='0 days'- 使所有松散的物体过期
  7. git fsck --full --unreachable- 检查是否有任何松动的物体
  8. git repack -A -d- 重新包装
  9. git prune- 最后删除这些对象

采纳答案by Dan Moulding

I can't say for sure without access to your repository data, but I believe there are probably one or more packed refs still referencing old commits from before you ran git filter-branch. This would explain why git fsck --full --unreachabledoesn't call the large blob an unreachable object, even though you've expired your reflog and removed the original (unpacked) refs.

如果无法访问您的存储库数据,我不能肯定地说,但我相信可能有一个或多个打包的引用仍在引用您运行之前的旧提交git filter-branch。这将解释为什么git fsck --full --unreachable不将大 blob 称为无法访问的对象,即使您的 reflog 已过期并删除了原始(未打包)的 refs。

Here's what I'd do (after git filter-branchand git gchave been done):

这是我会做的事情(之后git filter-branchgit gc已经完成):

1) Make sure original refs are gone:

1)确保原始参考消失:

rm -rf .git/refs/original

rm -rf .git/refs/original

2) Expire all reflog entries:

2) 使所有 reflog 条目过期:

git reflog expire --all --expire='0 days'

git reflog expire --all --expire='0 days'

3) Check for old packed refs

3)检查旧的打包参考

This could potentially be tricky, depending on how many packed refs you have. I don't know of any Git commands that automate this, so I think you'll have to do this manually. Make a backup of .git/packed-refs. Now edit .git/packed-refs. Check for old refs (in particular, see if it packed any of the refs from .git/refs/original). If you find any old ones that don't need to be there, delete them (remove the line for that ref).

这可能会很棘手,具体取决于您拥有多少个打包的 ref。我不知道有任何 Git 命令可以自动执行此操作,因此我认为您必须手动执行此操作。备份.git/packed-refs. 现在编辑.git/packed-refs. 检查旧参考(特别是,查看它是否包含来自 的任何参考.git/refs/original)。如果您发现任何不需要的旧文件,请删除它们(删除该引用的行)。

After you finish cleaning up the packed-refsfile, see if git fscknotices the unreachable objects:

清理完packed-refs文件后,查看是否git fsck注意到无法访问的对象:

git fsck --full --unreachable

git fsck --full --unreachable

If that worked, and git fscknow reports your large blob as unreachable, you can move on to the next step.

如果这样做有效,并且git fsck现在将您的大 blob 报告为无法访问,您可以继续下一步。

4) Repack your packed archive(s)

4) 重新打包您打包的档案

git repack -A -d

git repack -A -d

This will ensure that the unreachable objects get unpacked and stayunpacked.

这将确保无法访问的对象被解包并保持解包状态。

5) Prune loose (unreachable) objects

5)修剪松散(无法访问)的对象

git prune

git prune

And that should do it. Git really should have a better way to manage packed refs. Maybe there is a better way that I don't know about. In the absence of a better way, manual editing of the packed-refsfile might be the only way to go.

那应该这样做。Git 真的应该有一个更好的方法来管理打包的 refs。也许有我不知道的更好的方法。在没有更好的方法的情况下,手动编辑packed-refs文件可能是唯一的方法。

回答by Roberto Tyley

I'd recommend using the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branchspecifically designed for rewriting files from Git history. One way in which it makes your life easier here is that it actually handles allreferences by default (all tags, branches, stuff like refs/remotes/origin/master, etc) but it's also 10-50xfaster.

我建议使用BFG Repo-Cleaner,这是一种更简单、更快的替代方案,git-filter-branch专门用于重写 Git 历史记录中的文件。在这里让您的生活更轻松的一种方式是,它实际上默认处理所有引用(所有标签、分支、诸如 refs/remotes/origin/master 之类的东西),但它也快10-50倍。

You should carefully follow these steps here: http://rtyley.github.com/bfg-repo-cleaner/#usage- but the core bit is just this: download the BFG's jar(requires Java 6 or above) and run this command:

您应该在这里仔细按照以下步骤操作:http: //rtyley.github.com/bfg-repo-cleaner/#usage- 但核心位就是:下载BFG 的 jar(需要 Java 6 或更高版本)并运行此命令:

$ java -jar bfg.jar  --delete-files file_name  my-repo.git

Any file named file_name(that isn't in your latestcommit) will be will be totally removedfrom your repository's history. You can then use git gcto clean away the dead data:

任何命名的文件file_name(不在您的最新提交中)都将从您的存储库历史记录中完全删除。然后您可以使用git gc清除死数据:

$ git gc --prune=now --aggressive

The BFG is generally much simpler to use than git-filter-branch- the options are tailored around these two common use-cases:

BFG 通常比使用简单得多git-filter-branch- 选项是围绕这两个常见用例量身定制的:

  • Removing Crazy Big Files
  • Removing Passwords, Credentials& other Private data
  • 删除疯狂的大文件
  • 删除密码、凭据和其他私人数据

Full disclosure: I'm the author of the BFG Repo-Cleaner.

完全披露:我是 BFG Repo-Cleaner 的作者。

回答by Mike Averto

I found this to be quite helpful with regards to removing a whole folder as the above didn't really help me: https://help.github.com/articles/remove-sensitive-data.

我发现这对于删除整个文件夹非常有帮助,因为上述内容并没有真正帮助我:https: //help.github.com/articles/remove-sensitive-data

I used:

我用了:

git filter-branch -f --force \
--index-filter 'git rm -rf --cached --ignore-unmatch folder/sub-folder' \
--prune-empty --tag-name-filter cat -- --all

rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --prune=now
git gc --aggressive --prune=now

回答by BHMulder

I was trying to get rid of a big file in the history, and the above answers worked, up to a point. The point is: they don't work if you have tags. If the commit containing the big file is reachable from a tag, then you would need to adjust the filter-branches command thusly:

我试图摆脱历史上的一个大文件,并且上述答案在某种程度上有效。关键是:如果你有标签,它们就不起作用。如果可以从标签访问包含大文件的提交,那么您需要这样调整 filter-branches 命令:

git filter-branch --tag-name-filter cat \
--index-filter 'git rm --cached --ignore-unmatch huge_file_name' -- \
--all --tags

回答by Wayne Conrad

See: How do I remove sensitive files from git's history

请参阅:如何从 git 的历史记录中删除敏感文件

The above will fail if the file does not exist in a rev. In that case, the '--ignore-unmatch' switch will fix it:

如果文件在 rev 中不存在,则上述操作将失败。在这种情况下,“--ignore-unmatch”开关将修复它:

git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch <filename>' HEAD

Then, to get all loose objects out of the repostiry:

然后,要从 repostiry 中取出所有松散的对象:

git gc --prune='0 days ago'

回答by Spain Train

This should be covered by the git obliteratecommand in Git Extras (https://github.com/visionmedia/git-extras).

这应该包含git obliterate在 Git Extras ( https://github.com/visionmedia/git-extras) 中的命令中。

git obliterate <filename>

回答by VonC

You have various reasons for a still large git repo size after git gc, since it does not remove all loose objects.

之后仍然有很大的 git repo 大小有多种原因git gc,因为它不会删除所有松散的对象

I detail those reasons in "reduce the git repository size"

我在“减少 git 存储库大小”中详细说明了这些原因

But one trick to test in your case would be to cloneyour "cleaned" Git repoand see if the clone has the appropriate size.

但是在你的情况下测试的一个技巧是克隆你的“清理过的”Git repo,看看克隆是否有合适的大小。

(' "cleaned" repo ' being the one where you did apply the filter-branch, and then gcand prune)

('“清理过的”repo '是您应用filter-branch, 然后gc和 的地方prune

回答by Cyril Leroux

I had the same problem and I found a great tutorialon github that explain step by step how to get rid of files you accidentally committed.

我遇到了同样的问题,我在 github 上找到了一个很棒的教程,它逐步解释了如何删除您不小心提交的文件。

Here is a little summary of the procedure as Cupcake suggested.

这是 Cupcake 建议的程序的小摘要。

If you have a file named file_to_removeto remove from the history :

如果您有一个名为file_to_remove要从历史记录中删除的文件:

cd path_to_parent_dir

git filter-branch --force --index-filter \
  'git rm --cached --ignore-unmatch file_to_remove' \
  --prune-empty --tag-name-filter cat -- --all