git 使用正则表达式过滤差异
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8219900/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Filtering a diff with a regular expression
提问by Casebash
It seems that it would be extremely handy to be able to filter a diff so that trivial changes are not displayed. I would like to write a regular expression which would be run on the line and then pass it another string that uses the captured arguments to generate a canonical form. If the lines before and after produce the same output, then they would be removed from the diff.
似乎能够过滤差异以便不显示微不足道的更改会非常方便。我想编写一个正则表达式,该表达式将在线上运行,然后将另一个字符串传递给它,该字符串使用捕获的参数来生成规范形式。如果之前和之后的行产生相同的输出,那么它们将从差异中删除。
For example, I am working on a PHP code base where a significant number of array accesses are written as my_array[my_key]
when they should be my_array["my_key"]
to prevent issues if a my_key
constant is defined. It would be useful to generate a diff where the only change on the line wasn't adding some quotes.
例如,我的工作在哪里数组访问的显著数写成一个PHP代码库my_array[my_key]
时,他们应该是my_array["my_key"]
防止如果问题my_key
不断被定义。生成一个差异会很有用,其中该行的唯一更改不是添加一些引号。
I can't change them all at once, as we don't have the resources to test the entire code base, so am fixing this whenever I make a change to a function. How can I achieve this? Is there anything else similar to this that I can use to achieve a similar result. For example, a simpler method might be to skip the canonical form and just see if the input is transformed into the output. BTW, I am using Git
我无法一次全部更改它们,因为我们没有资源来测试整个代码库,因此每当我对函数进行更改时都会修复此问题。我怎样才能做到这一点?有没有其他类似的东西可以用来实现类似的结果。例如,一种更简单的方法可能是跳过规范形式,只查看输入是否转换为输出。顺便说一句,我正在使用 Git
采纳答案by Dan Cruz
There does not seem to be any options to Git's diff
command to support what you want to do. However, you could use the GIT_EXTERNAL_DIFF
environment variableand a custom script (or any executable created using your preferred scripting or programming language) to manipulate a patch.
Git 的diff
命令似乎没有任何选项来支持您想要执行的操作。但是,您可以使用GIT_EXTERNAL_DIFF
环境变量和自定义脚本(或使用首选脚本或编程语言创建的任何可执行文件)来操作补丁。
I'll assume you are on Linux; if not, you could tweak this concept to suit your environment. Let's say you have a Git repo where HEAD
has a file file05
that contains:
我假设你在 Linux 上;如果没有,您可以调整此概念以适应您的环境。假设您有一个 Git 存储库,其中HEAD
包含一个文件file05
:
line 26662: $my_array[my_key]
And a file file06
that contains:
以及一个file06
包含以下内容的文件:
line 19768: $my_array[my_key]
line 19769: $my_array[my_key]
line 19770: $my_array[my_key]
line 19771: $my_array[my_key]
line 19772: $my_array[my_key]
line 19773: $my_array[my_key]
line 19775: $my_array[my_key]
line 19776: $my_array[my_key]
You change file05
to:
你file05
改为:
line 26662: $my_array["my_key"]
And you change file06
to:
然后你file06
改为:
line 19768: $my_array[my_key]
line 19769: $my_array["my_key"]
line 19770: $my_array[my_key]
line 19771: $my_array[my_key]
line 19772: $my_array[my_key]
line 19773: $my_array[my_key]
line 19775: $my_array[my_key2]
line 19776: $my_array[my_key]
Using the following shell script, let's call it mydiff.sh
and place it somewhere that's in our PATH
:
使用以下 shell 脚本,让我们调用它mydiff.sh
并将其放置在我们的PATH
.
#!/bin/bash
echo "$@"
git diff-files --patch --word-diff=porcelain "" | awk '
/^-./ {rec = FNR; prev = substr(GIT_EXTERNAL_DIFF=mydiff.sh git --no-pager diff
, 2);}
FNR == rec + 1 && /^+./ {
ln = substr(file05 /tmp/r2aBca_file05 d86525edcf5ec0157366ea6c41bc6e4965b3be1e 100644 file05 0000000000000000000000000000000000000000 100644
index d86525e..c2180dc 100644
--- a/file05
+++ b/file05
@@ -1 +1 @@
line 26662:
$my_array[my_key]
~
file06 /tmp/2lgz7J_file06 d84a44f9a9aac6fb82e6ffb94db0eec5c575787d 100644 file06 0000000000000000000000000000000000000000 100644
index d84a44f..bc27446 100644
--- a/file06
+++ b/file06
@@ -1,8 +1,8 @@
line 19768: $my_array[my_key]
~
line 19769:
$my_array[my_key]
~
line 19770: $my_array[my_key]
~
line 19771: $my_array[my_key]
~
line 19772: $my_array[my_key]
~
line 19773: $my_array[my_key]
~
line 19775:
-$my_array[my_key]
+$my_array[my_key2]
~
line 19776: $my_array[my_key]
~
, 2);
gsub("\[\"", "[", ln);
gsub("\"\]", "]", ln);
if (prev == ln) {
print " " ln;
} else {
print "-" prev;
print "+" ln;
}
}
FNR != rec && FNR != rec + 1 {print;}
'
Executing the command:
执行命令:
$ git diff --help
-G<regex>
Look for differences whose added or removed line matches the given <regex>.
Will output:
将输出:
git diff -b -w --word-diff-regex='.*\[[^"]*\]'
This output does not show changes for the added quotes in file05
and file06
. The external diff script basically uses the Git diff-files
command to create the patch and filters the output through a GNU awk
script to manipulate it. This sample script does not handle all the different combinations of old and new files mentioned for GIT_EXTERNAL_DIFF
nor does it output a valid patch, but it should be enough to get you started.
该输出不显示在加引号的变化file05
和file06
。外部 diff 脚本基本上使用 Gitdiff-files
命令来创建补丁并通过GNUawk
脚本过滤输出以对其进行操作。此示例脚本不会处理提到的所有新旧文件的不同组合,GIT_EXTERNAL_DIFF
也不会输出有效的补丁,但它应该足以让您入门。
You could use Perl regular expressions, Python difflib
or whatever you're comfortable with to implement an external diff tool that suits your needs.
您可以使用Perl 正则表达式、Pythondifflib
或任何您熟悉的方法来实现适合您需要的外部差异工具。
回答by Hauleth
diff --git a/test.php b/test.php
index 62a2de0..b76891f 100644
--- a/test.php
+++ b/test.php
@@ -1,3 +1,5 @@
<?php
{+$my_array[my_key]+} = "test";
?>
diff --git a/test1.php b/test1.php
index 62a2de0..6102fed 100644
--- a/test1.php
+++ b/test1.php
@@ -1,3 +1,5 @@
<?php
some_other_stuff();
?>
EDIT:
编辑:
After some tests I've got something like
经过一些测试,我得到了类似的东西
git diff -G'\[[A-Za-z_]*\]' --pickaxe-regex
Then I've got output like:
然后我得到了如下输出:
$ git diff -U1 | grepdiff 'console' --output-matching=hunk
Maybe it will help you. I found it here http://www.rhinocerus.net/forum/lang-lisp/659593-git-word-diff-regex-lisp-source.htmland there is more information on this thread
也许它会帮助你。我在这里找到它http://www.rhinocerus.net/forum/lang-lisp/659593-git-word-diff-regex-lisp-source.html并且有关于这个线程的更多信息
EDIT2:
编辑2:
my $matches = `git diff -- mytestfile`
回答by Naga Kiran
回答by Paul Nikonowicz
from my own git --help
来自我自己的 git --help
--word-diff-regex=
<regex>
Use
<regex>
to decide what a word is, instead of considering runs of non-whitespace to be a word. Also implies --word-diff unless it was already enabled. Every non-overlapping match of the<regex>
is considered a word. Anything between these matches is considered whitespace and ignored(!) for the purposes of finding differences. You may want to append|[^[:space:]]
to your regular expression to make sure that it matches all non-whitespace characters. A match that contains a newline is silently truncated(!) at the newline. The regex can also be set via a diff driver or configuration option, see gitattributes(1) or git-config(1). Giving it explicitly overrides any diff driver or configuration setting. Diff drivers override configuration settings.
--word-diff-regex=
<regex>
使用
<regex>
来决定一个词是什么,而不是考虑非空白的运行是一个字。也暗示 --word-diff 除非它已经启用。的每个非重叠匹配项都<regex>
被视为一个单词。这些匹配项之间的任何内容都被视为空格并被忽略(!)以查找差异。您可能希望附加|[^[:space:]]
到您的正则表达式以确保它匹配所有非空白字符。包含换行符的匹配项在换行符处被静默截断(!)。也可以通过差异驱动程序或配置选项设置正则表达式,请参阅 gitattributes(1) 或 git-config(1)。明确提供它会覆盖任何差异驱动程序或配置设置。差异驱动程序覆盖配置设置。
回答by Has QUIT--Anony-Mousse
Normalize the input files in a first step, then compare the normalized files. This gives you most control over the process. E.g. you might want to only apply the regexp to non-HTML parts of the code, not inside of strings, not inside of comments (or ignore comments altogether). Computing a diff on the normalized code is the proper way to do such things; working with regexps on single lines is much more error-prone and at most a hack.
在第一步中规范化输入文件,然后比较规范化的文件。这使您可以最大程度地控制该过程。例如,您可能只想将正则表达式应用于代码的非 HTML 部分,而不是字符串内部,而不是注释内部(或完全忽略注释)。在规范化代码上计算差异是做这些事情的正确方法;在单行上使用正则表达式更容易出错,最多是一个黑客。
Some diff utilities such as e.g. meld
allow hiding "insignificant" difference, and come with a set of default patterns to e.g. hide whitespace-only changes. This is pretty much what you want, I guess.
一些差异实用程序,例如meld
允许隐藏“无关紧要”的差异,并带有一组默认模式,例如隐藏仅空白更改。这几乎是你想要的,我想。
回答by eckes
I use an approach that combines git diff
and applying a regular expression matching on the results. In some testing code (PERL), I know that testing is successful when the OutputFingerprint
stored in the resulting files of the tests has not changed.
我使用的方法结合git diff
并应用正则表达式匹配结果。在某些测试代码 (PERL) 中,当OutputFingerprint
存储在测试结果文件中的内容没有改变时,我知道测试是成功的。
First, I do a
首先,我做一个
##代码##and then evaluate the result:
然后评估结果:
##代码##回答by Ira Baxter
If the goal is minimize trivial differences, you might consider our SmartDifferencertool.
如果目标是最小化细微差异,您可以考虑我们的SmartDifferencer工具。
These tools compare the language syntax, not the layout, so many trivial changes (layout, modified comments, even changed radix on numbers) are ignored and not reported. Each tool has a full language parser; there's a version for many languages, including PHP.
这些工具比较的是语言语法,而不是布局,因此忽略了许多琐碎的更改(布局、修改的注释,甚至更改了数字的基数)而不报告。每个工具都有一个完整的语言解析器;有一个适用于多种语言的版本,包括 PHP。
It won't handle the example $FOO[abc] as being "semantically identical" to $FOO["abc"], because they are not. If abc actaully has a definition as as constant, then $FOO["abc"] is not semantically equivalent.
它不会将示例 $FOO[abc] 处理为与 $FOO["abc"] 在“语义上相同”,因为它们不是。如果 abc 实际上有一个定义为常量,那么 $FOO["abc"] 在语义上是不等价的。