Linux 你如何用sed“调试”一个正则表达式?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4052253/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do you "debug" a regular expression with sed?
提问by Somebody still uses you MS-DOS
I'm trying to use a regexp using sed
. I've tested my regex with kiki, a gnome application to test regexpd, and it works in kiki.
我正在尝试使用正则表达式使用sed
. 我已经用 kiki 测试了我的正则表达式,这是一个用于测试 regexpd 的 gnome 应用程序,它在 kiki 中有效。
date: 2010-10-29 14:46:33 -0200; author: 00000000000; state: Exp; lines: +5 -2; commitid: bvEcb00aPyqal6Uu;
I want to replace author: 00000000000;
with nothing. So, I created the regexp, that works when I test it in kiki:
我想author: 00000000000;
一无所有。所以,我创建了正则表达式,当我在 kiki 中测试它时它有效:
author:\s[0-9]{11};
But doesn't work when I test it in sed
.
但是当我在sed
.
sed -i "s/author:\s[0-9]{11};//g" /tmp/test_regex.txt
I know regex have different implementations, and this could be the issue. My question is: how do I at least try do "debug" what's happening with sed? Why is it not working?
我知道正则表达式有不同的实现,这可能是问题所在。我的问题是:我如何至少尝试“调试”sed 发生的事情?为什么它不起作用?
采纳答案by paxdiablo
My version of sed
doesn't like the {11}
bit. Processing the line with:
我的版本sed
不喜欢这{11}
一点。处理该行:
sed 's/author: [0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9];//g'
works fine.
工作正常。
And the way I debug it is exactly what I did here. I just constructed a command:
我调试它的方式正是我在这里所做的。我刚刚构建了一个命令:
echo 'X author: 00000000000; X' | sed ...
and removed the more advanced regex things one at a time:
并一次删除一个更高级的正则表达式:
- used
<space>
instead of\s
, didn't fix it. - replaced
[0-9]{11}
with 11 copies of[0-9]
, that worked.
- 使用
<space>
而不是\s
,没有修复它。 - 替换
[0-9]{11}
为 11 个副本[0-9]
,效果很好。
It pretty much hadto be one of those since I've used every other feature of your regex before with sed
successfully.
它几乎必须是其中之一,因为我之前已经成功使用了正则表达式的所有其他功能sed
。
But, in fact, this willactually work without the hideous 11 copies of [0-9]
, you just have to escape the braces [0-9]\{11\}
. I have to admit I didn't get around to trying that since it worked okay with the multiples and I generally don't concern myself too much with brevity in sed
since I tend to use it more for quick'n'dirty jobs :-)
但是,事实上,这会竟没有狰狞11份工作[0-9]
,你只需要逃避括号[0-9]\{11\}
。我不得不承认我没有尝试这样做,因为它可以与倍数一起使用,而且我通常不会太在意简洁,sed
因为我倾向于将它更多地用于快速和肮脏的工作:-)
But the brace method isa lot more concise and adaptable and it's good to know how to do it.
但是支架的方法是很多更简洁和适应性,这是很好的知道如何做到这一点。
回答by Alberto Zaccagni
The fact that you are substituting author: 00000000000
is already said in sed
when you add the s
before the first /
.
您所替代,这一事实author: 00000000000
已经在说sed
,当你加入s
之前第一/
。
回答by Brian Clements
You are using the -i flag incorrectly. You need to put give it a string to put on the temporary file. You also need to escape your curly braces.
您错误地使用了 -i 标志。你需要给它一个字符串放在临时文件上。你还需要避开你的花括号。
sed -ibak -e "s/author:\s[0-9]\{11\};//g" /tmp/test_regex.txt
I usually debug my statement by starting with a regex I know will work (like 's/author//g' in this case). When that works I know that I have the right arguments. Then I expand the regex incrementally.
我通常通过从我知道可以工作的正则表达式开始来调试我的语句(如本例中的 's/author//g')。当那行得通时,我知道我有正确的论点。然后我逐步扩展正则表达式。
回答by verisimilidude
In sed you need to escape the curly braces. "s/author:\s[0-9]\{11\};//g"
should work.
在 sed 中,您需要避开花括号。"s/author:\s[0-9]\{11\};//g"
应该管用。
Sed has no debug capability. To test you simplify at the command line iteratively until you get something to work and then build back up.
Sed 没有调试功能。为了测试你在命令行上迭代地简化,直到你得到一些工作,然后重新构建。
command line input:
命令行输入:
$ echo 'xx a: 00123 b: 5432' | sed -e 's/a:\s[0-9]\{5\}//'
command line output:
命令行输出:
xx b: 5432
回答by tchrist
That looks more like a perl regex than it does a sed regex. Perhaps you would prefer using
这看起来更像是 perl 正则表达式而不是 sed 正则表达式。也许你更喜欢使用
perl -pi.orig -e 's/author:\s[0-9]{11};//g' file1 file2 file3
At least that way you could always add -Mre=debug
to debug the regex.
至少这样你总是可以添加-Mre=debug
调试正则表达式。
回答by Paused until further notice.
There is a Python script called sedsed
by Aurelio Jargas which will show the stepwise execution of a sed
script. A debugger like this isn't going to help much in the case of characters being taken literally (e.g. {
) versus having special meaning (e.g. \{
), especially for a simple substitution, but it will help when a more complex script is being debugged.
有一个sedsed
由 Aurelio Jargas调用的 Python 脚本,它将显示sed
脚本的逐步执行。在字符按字面意思(例如{
)与具有特殊含义(例如\{
)的情况下,这样的调试器不会有太大帮助,尤其是对于简单的替换,但在调试更复杂的脚本时会有所帮助。
The latest SVN version.
The most recent stable release.
Disclaimer: I am a minor contributor to sedsed
.
最新的SVN版本。
最新的稳定版本。
免责声明:我是sedsed
.
Another sed
debugger, sd
by Brian Hiles, written as a Bourne shell script (I haven't used this one).
另一个sed
调试器,sd
由 Brian Hiles 编写,编写为 Bourne shell 脚本(我没有使用过这个)。
回答by Ray
You have to use the -r flag for extended regex:
您必须将 -r 标志用于扩展正则表达式:
sed -r 's/author:\s[0-9]{11};//g'
or you have to escape the {} characters:
或者您必须转义 {} 字符:
sed 's/author:\s[0-9]\{11\};//g'
回答by gagallo7
If you want to debug a sed
command, you can use the w
(write) command to dump which lines sed
has matched to a file.
如果要调试sed
命令,可以使用w
(write) 命令转储sed
与文件匹配的行。
From sed manpages
:
来自sed manpages
:
Commands which accept address ranges
(...)
w filename
Write the current pattern space to filename.
接受地址范围的命令
(……)
文件名
将当前模式空间写入文件名。
Applying to your question
适用于您的问题
Let's use a file named sed_dump.txtas the sed dump file.
让我们使用名为sed_dump.txt的文件作为 sed 转储文件。
1) Generate the sed dump:
1) 生成 sed 转储:
sed "/author:\s[0-9]{11};/w sed_dump.txt" /tmp/test_regex.txt
2) Check file sed_dump.txtcontents:
2) 检查文件sed_dump.txt 的内容:
cat sed_dump.txt
Output:
输出:
It's empty...
它是空的...
3) Trying to escape '{' regex control character:
3) 试图转义 '{' 正则表达式控制字符:
sed "/author:\s[0-9]\{11\};/w sed_dump.txt" /tmp/test_regex.txt
4) Check file sed_dump.txtcontents:
4) 检查文件sed_dump.txt 的内容:
cat sed_dump.txt
Output:
输出:
date: 2010-10-29 14:46:33 -0200; author: 00000000000; state: Exp; lines: +5 -2; commitid: bvEcb00aPyqal6Uu;
日期:2010-10-29 14:46:33 -0200;作者:00000000000;状态:Exp; 线数:+5 -2;commitid: bvEcb00aPyqal6Uu;
Conclusion
结论
In step 4), a line has been matched, this means that sed
matched your pattern in that line. It does not guarantee the correct answer, but it's a way of debugging using sed
itself.
在步骤 4) 中,匹配了一行,这意味着sed
匹配您在该行中的模式。它不能保证正确的答案,但它是一种使用sed
自身进行调试的方式。