Linux 你如何用sed“调试”一个正则表达式?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4052253/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-03 23:47:12  来源:igfitidea点击:

How do you "debug" a regular expression with sed?

regexlinuxdebuggingsed

提问by Somebody still uses you MS-DOS

I'm trying to use a regexp using sed. I've tested my regex with kiki, a gnome application to test regexpd, and it works in kiki.

我正在尝试使用正则表达式使用sed. 我已经用 kiki 测试了我的正则表达式,这是一个用于测试 regexpd 的 gnome 应用程序,它在 kiki 中有效。

date: 2010-10-29 14:46:33 -0200;  author: 00000000000;  state: Exp;  lines: +5 -2;  commitid: bvEcb00aPyqal6Uu;

I want to replace author: 00000000000;with nothing. So, I created the regexp, that works when I test it in kiki:

我想author: 00000000000;一无所有。所以,我创建了正则表达式,当我在 kiki 中测试它时它有效:

author:\s[0-9]{11};

But doesn't work when I test it in sed.

但是当我在sed.

sed -i "s/author:\s[0-9]{11};//g" /tmp/test_regex.txt

I know regex have different implementations, and this could be the issue. My question is: how do I at least try do "debug" what's happening with sed? Why is it not working?

我知道正则表达式有不同的实现,这可能是问题所在。我的问题是:我如何至少尝试“调试”sed 发生的事情?为什么它不起作用?

采纳答案by paxdiablo

My version of seddoesn't like the {11}bit. Processing the line with:

我的版本sed不喜欢这{11}一点。处理该行:

sed 's/author: [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9];//g'

works fine.

工作正常。

And the way I debug it is exactly what I did here. I just constructed a command:

我调试它的方式正是我在这里所做的。我刚刚构建了一个命令:

echo 'X author: 00000000000; X' | sed ...

and removed the more advanced regex things one at a time:

并一次删除一个更高级的正则表达式:

  • used <space>instead of \s, didn't fix it.
  • replaced [0-9]{11}with 11 copies of [0-9], that worked.
  • 使用<space>而不是\s,没有修复它。
  • 替换[0-9]{11}为 11 个副本[0-9],效果很好。

It pretty much hadto be one of those since I've used every other feature of your regex before with sedsuccessfully.

它几乎必须是其中之一,因为我之前已经成功使用了正则表达式的所有其他功能sed

But, in fact, this willactually work without the hideous 11 copies of [0-9], you just have to escape the braces [0-9]\{11\}. I have to admit I didn't get around to trying that since it worked okay with the multiples and I generally don't concern myself too much with brevity in sedsince I tend to use it more for quick'n'dirty jobs :-)

但是,事实上,这竟没有狰狞11份工作[0-9],你只需要逃避括号[0-9]\{11\}。我不得不承认我没有尝试这样做,因为它可以与倍数一起使用,而且我通常不会太在意简洁,sed因为我倾向于将它更多地用于快速和肮脏的工作:-)

But the brace method isa lot more concise and adaptable and it's good to know how to do it.

但是支架的方法很多更简洁和适应性,这是很好的知道如何做到这一点。

回答by Alberto Zaccagni

The fact that you are substituting author: 00000000000is already said in sedwhen you add the sbefore the first /.

您所替代,这一事实author: 00000000000已经在说sed,当你加入s之前第一/

回答by Brian Clements

You are using the -i flag incorrectly. You need to put give it a string to put on the temporary file. You also need to escape your curly braces.

您错误地使用了 -i 标志。你需要给它一个字符串放在临时文件上。你还需要避开你的花括号。

sed -ibak -e "s/author:\s[0-9]\{11\};//g" /tmp/test_regex.txt

I usually debug my statement by starting with a regex I know will work (like 's/author//g' in this case). When that works I know that I have the right arguments. Then I expand the regex incrementally.

我通常通过从我知道可以工作的正则表达式开始来调试我的语句(如本例中的 's/author//g')。当那行得通时,我知道我有正确的论点。然后我逐步扩展正则表达式。

回答by verisimilidude

In sed you need to escape the curly braces. "s/author:\s[0-9]\{11\};//g"should work.

在 sed 中,您需要避开花括号。"s/author:\s[0-9]\{11\};//g"应该管用。

Sed has no debug capability. To test you simplify at the command line iteratively until you get something to work and then build back up.

Sed 没有调试功能。为了测试你在命令行上迭代地简化,直到你得到一些工作,然后重新构建。

command line input:

命令行输入:

$ echo 'xx a: 00123 b: 5432' | sed -e 's/a:\s[0-9]\{5\}//'

command line output:

命令行输出:

xx  b: 5432

回答by tchrist

That looks more like a perl regex than it does a sed regex. Perhaps you would prefer using

这看起来更像是 perl 正则表达式而不是 sed 正则表达式。也许你更喜欢使用

perl -pi.orig -e 's/author:\s[0-9]{11};//g' file1 file2 file3

At least that way you could always add -Mre=debugto debug the regex.

至少这样你总是可以添加-Mre=debug调试正则表达式。

回答by Paused until further notice.

There is a Python script called sedsedby Aurelio Jargas which will show the stepwise execution of a sedscript. A debugger like this isn't going to help much in the case of characters being taken literally (e.g. {) versus having special meaning (e.g. \{), especially for a simple substitution, but it will help when a more complex script is being debugged.

有一个sedsed由 Aurelio Jargas调用的 Python 脚本,它将显示sed脚本的逐步执行。在字符按字面意思(例如{)与具有特殊含义(例如\{)的情况下,这样的调试器不会有太大帮助,尤其是对于简单的替换,但在调试更复杂的脚本时会有所帮助。

The latest SVN version.
The most recent stable release.
Disclaimer: I am a minor contributor to sedsed.

最新的SVN版本
最新的稳定版本
免责声明:我是sedsed.

sedsed example

sedsed 示例

Another seddebugger, sdby Brian Hiles, written as a Bourne shell script (I haven't used this one).

另一个sed调试器,sd由 Brian Hiles 编写,编写为 Bourne shell 脚本(我没有使用过这个)。

回答by Ray

You have to use the -r flag for extended regex:

您必须将 -r 标志用于扩展正则表达式:

sed -r 's/author:\s[0-9]{11};//g'

or you have to escape the {} characters:

或者您必须转义 {} 字符:

sed 's/author:\s[0-9]\{11\};//g'

回答by gagallo7

If you want to debug a sedcommand, you can use the w(write) command to dump which lines sedhas matched to a file.

如果要调试sed命令,可以使用w(write) 命令转储sed与文件匹配的行。

From sed manpages:

来自sed manpages

Commands which accept address ranges

(...)

w filename

Write the current pattern space to filename.

接受地址范围的命令

(……)

文件名

将当前模式空间写入文件名。



Applying to your question

适用于您的问题

Let's use a file named sed_dump.txtas the sed dump file.

让我们使用名为sed_dump.txt的文件作为 sed 转储文件。

1) Generate the sed dump:

1) 生成 sed 转储:

sed "/author:\s[0-9]{11};/w sed_dump.txt" /tmp/test_regex.txt

2) Check file sed_dump.txtcontents:

2) 检查文件sed_dump.txt 的内容:

cat sed_dump.txt

Output:

输出:

It's empty...

它是空的...

3) Trying to escape '{' regex control character:

3) 试图转义 '{' 正则表达式控制字符:

sed "/author:\s[0-9]\{11\};/w sed_dump.txt" /tmp/test_regex.txt

4) Check file sed_dump.txtcontents:

4) 检查文件sed_dump.txt 的内容:

cat sed_dump.txt

Output:

输出:

date: 2010-10-29 14:46:33 -0200; author: 00000000000; state: Exp; lines: +5 -2; commitid: bvEcb00aPyqal6Uu;

日期:2010-10-29 14:46:33 -0200;作者:00000000000;状态:Exp; 线数:+5 -2;commitid: bvEcb00aPyqal6Uu;

Conclusion

结论

In step 4), a line has been matched, this means that sedmatched your pattern in that line. It does not guarantee the correct answer, but it's a way of debugging using seditself.

在步骤 4) 中,匹配了一行,这意味着sed匹配您在该行中的模式。它不能保证正确的答案,但它是一种使用sed自身进行调试的方式。