git 如何替换git历史中文件中的文本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4110652/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 04:43:22  来源:igfitidea点击:

How to substitute text from files in git history?

gitsubstitutiongit-filter-branchgit-rewrite-historybfg-repo-cleaner

提问by Tom

I've always used an interface based git client (smartGit) and thus don't have much experience with the git console.

我一直使用基于界面的 git 客户端 (smartGit),因此对 git 控制台没有太多经验。

However, I now face the need to substitute a string in all .txt files from history (so, not erasing the whole file but just substituting a string). I found the following command:

但是,我现在需要替换历史记录中所有 .txt 文件中的字符串(因此,不是删除整个文件,而是替换一个字符串)。我找到了以下命令:

git filter-branch --tree-filter 'git ls-files -z "*.php" |xargs -0 perl -p -i -e "s#(PASSWORD1|PASSWORD2|PASSWORD3)#xXxXxXxXxXx#g"' -- --all

I tried this, and unfortunately noticed that while the password did get changed, all binary files got corrupted. Images, etc. would all be corrupted.

我试过了,不幸的是,虽然密码确实被更改了,但所有二进制文件都已损坏。图像等都会被破坏。

Is there a better way to do this that won't corrupt my binary files?

有没有更好的方法来做到这一点,不会破坏我的二进制文件?

Thanks.

谢谢。

EDIT:

编辑:

I got mixed up with something. The actual code that caused binary files to get corrupted was:

我被某些事情搞混了。导致二进制文件损坏的实际代码是:

$ git filter-branch --tree-filter "find . -type f -exec sed -i -e 's/originalpassword/newpassword/g' {} \;"

The code at the top actually removedall files with my password strangely enough.

顶部的代码实际上使用我的密码删除了所有文件,这很奇怪。

采纳答案by jweyrich

You can avoid touching undesired files by passing -name "pattern"to find.

您可以通过传递-name "pattern"find.

This works for me:

这对我有用:

git filter-branch --tree-filter "find . -name '*.php' -exec sed -i -e \
    's/originalpassword/newpassword/g' {} \;"

回答by Roberto Tyley

I'd recommend using the BFG Repo-Cleaner, a simpler, faster alternative to git-filter-branchspecifically designed for rewriting files from Git history.

我建议使用BFG Repo-Cleaner,这是一种更简单、更快的替代方案,git-filter-branch专门用于重写 Git 历史记录中的文件。

You should carefully follow these steps here: https://rtyley.github.io/bfg-repo-cleaner/#usage- but the core bit is just this: download the BFG's jar(requires Java 7 or above) and run this command:

您应该在这里仔细按照以下步骤操作:https: //rtyley.github.io/bfg-repo-cleaner/#usage- 但核心位就是:下载BFG 的 jar(需要 Java 7 或更高版本)并运行此命令:

$ java -jar bfg.jar  --replace-text replacements.txt -fi *.php  my-repo.git

The replacements.txtfile should contain all the substitutions you want to do, in a format like this (one entry per line - note the comments shouldn't be included):

replacements.txt文件应包含您想要执行的所有替换,格式如下(每行一个条目 - 请注意不应包含注释):

PASSWORD1 # Replace literal string 'PASSWORD1' with '***REMOVED***' (default)
PASSWORD2==>examplePass         # replace with 'examplePass' instead
PASSWORD3==>                    # replace with the empty string
regex:password=\w+==>password=  # Replace, using a regex
regex:\r(\n)==>               # Replace Windows newlines with Unix newlines

Your entire repository history will be scanned, and .phpfiles (under 1MB in size) will have the substitutions performed: any matching string (that isn't in your latestcommit) will be replaced.

您的整个存储库历史将被扫描,.php文件(大小小于 1MB)将执行替换:任何匹配的字符串(不在您的最新提交中)将被替换。

Full disclosure: I'm the author of the BFG Repo-Cleaner.

完全披露:我是 BFG Repo-Cleaner 的作者。

回答by Nay

I created a file at /usr/local/git/findsed.sh , with the following contents:

我在 /usr/local/git/findsed.sh 创建了一个文件,内容如下:

find . -name 'githubDirToSubmodule.sh' -exec sed -i '' -e 's/What I want to remove//g' {} \;

I ran the command:

我运行了命令:

git filter-branch --tree-filter "sh /usr/local/git/findsed.sh"

Explanation of commands

命令说明

When you run git filter-branch, this goes through each revision that you ever committed, one by one. --tree-filter runs the findsed.sh script on each committed revision, saves it, then progresses to the next revision.

当您运行 git filter-branch 时,它会逐一检查您提交的每个修订版。--tree-filter 在每个提交的修订上运行 foundsed.sh 脚本,保存它,然后进行到下一个修订。

The find command finds a specific file or set of files and executes (-exec) the sed editor on that file. sed is a command that takes the regex after s/ and replaces it with the string between / and /g (blank in my example). {} is a reference to the files path that was given by the find command. The file path is fed to sed, so that sed knows what to work on. \; just ends the -exec command.

find 命令查找特定文件或文件集并对该文件执行 (-exec) sed 编辑器。sed 是一个命令,它将 s/ 之后的正则表达式替换为 / 和 /g 之间的字符串(在我的示例中为空白)。{} 是对 find 命令提供的文件路径的引用。文件路径被提供给 sed,因此 sed 知道要处理什么。\; 只是结束 -exec 命令。

Seperating the shell script and command out into seperate pieces allows for less complication when it comes to quotes '' or "".

当涉及到引号 '' 或 "" 时,将 shell 脚本和命令分成单独的部分可以减少复杂性。

Peculiarities

特点

I successfully implemented this on a mac, and apparently sed is a particular (older?) version on macs. This matters, as it sometimes behaves differently. Make sure to do sed -i '' or else it was adding a "-e" to the end of files, thinking that that was what i wanted to name my backup files. -i '' says dont make backup files, just edit the files in place and no backup file needed.

我在 mac 上成功实现了这个,显然 sed 是 mac 上的一个特定(旧?)版本。这很重要,因为它有时表现不同。确保执行 sed -i '' 否则它会在文件末尾添加“-e”,认为这就是我想要命名备份文件的原因。-i '' 表示不要制作备份文件,只需就地编辑文件,不需要备份文件。

Specifying -name 'filename.sh' helped me avoid another issue that I could not solve. There was another file with .sh and that file ended without a newline character. sed for some reason, would add a newline character to the end, despite the 's/blah/blah/g' not matching anything in that file. So instead of figuring out that issue, I just told the find to ignore all other files.

指定 -name 'filename.sh' 帮助我避免了另一个我无法解决的问题。还有另一个带有 .sh 的文件,该文件没有换行符结束。sed 出于某种原因,会在末尾添加一个换行符,尽管 's/blah/blah/g' 与该文件中的任何内容都不匹配。因此,我没有弄清楚这个问题,而是告诉 find 忽略所有其他文件。

Additional commands that work

其他有效的命令

Additionally, I found these commands to work in the findsed.sh file (only one command at a time, not multple, so comment # the others out):

此外,我发现这些命令可以在 findsed.sh 文件中使用(一次只能执行一个命令,而不是多个命令,因此将其他命令注释掉):

find . -name '.publishNewZenPackFromGithub.sh.swp' -exec rm -f {} \;
find . -name '*' -exec grep -H PassToRemove {} \;

Enjoy!

享受!

回答by Ben Hymanson

Could be a shell expansion issue. If filter-branch is losing the quotes around "*.php"by the time it evaluates the command, it may be expanding to nothing, thus git ls-files -zlisting all files.

可能是外壳扩展问题。如果 filter-branch 在"*.php"评估命令时丢失了引号,它可能会扩展为空,从而git ls-files -z列出所有文件。

You could check the filter-branch source or trying different quoting tricks, but what I'd do is just make a one-line shell script that does your tree-filter and pass that script instead.

您可以检查过滤器分支源或尝试不同的引用技巧,但我要做的只是制作一个单行的 shell 脚本来执行您的树过滤器并传递该脚本。

回答by VonC

With Git 2.24 (Q4 2019), git filter-branch(and BFG) is deprecated.

在 Git 2.24(2019 年第四季度)中,git filter-branch(和 BFG)已弃用

The equivalent would be, using newren/git-filter-repo, and its example section:

等效的将是, using newren/git-filter-repo,及其示例部分

cd repo
git filter-repo --path-glob '*.txt' --replace-text expressions.txt

with expressions.txt:

expressions.txt

literal:originalpassword==>newpassword