bash Perl one liner 提取多行模式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/11792967/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Perl one liner to extract a multi-line pattern
提问by Gil
I have a pattern in a file as follows which can/cannot span over multiple lines :
我在文件中有一个模式,如下所示,它可以/不能跨越多行:
 abcd25
 ef_gh
 ( fg*_h
 hj_b*
 hj ) {
What I have tried :
我尝试过的:
perl -nle 'print while m/^\s*(\w+)\s+(\w+?)\s*(([\w-0-9,* \s]))\s{/gm'
perl -nle '打印时 m/^\s*(\w+)\s+(\w+?)\s*(([\w-0-9,* \s] ))\s{/gm'
I dont know what the flags mean here but all I did was write a regexfor the pattern and insert it in the pattern space .This matches well if the the pattern is in a single line as :
我不知道这里的标志是什么意思,但我所做的只是regex为模式编写一个并将其插入模式空间。如果模式在一行中,则匹配良好:
abcd25 ef_gh ( fg*_h hj_b* hj ) {
But fails exclusively in the multiline case !
但仅在多行情况下失败!
I started with perl yesterday but the syntax is way too confusing . So , as suggested by one of our fellow SO mate ,I wrote a regexand inserted it in the code provided by him .
我昨天开始使用 perl 但语法太混乱了。因此,按照我们的一位 SO 伙伴的建议,我写了一个regex并将其插入到他提供的代码中。
I hope a perlmonk can help me in this case . Alternative solutions are welcome . 
perl在这种情况下,我希望有和尚可以帮助我。欢迎使用替代解决方案。
Input file :
输入文件 :
 abcd25
 ef_gh
 ( fg*_h
 hj_b*
 hj ) {
 abcd25
 ef_gh
 fg*_h
 hj_b*
 hj ) {
 jhijdsiokdù ()lmolmlxjk;
 abcd25 ef_gh ( fg*_h hj_b* hj ) {
Expected output :
预期输出:
 abcd25
 ef_gh
 ( fg*_h
 hj_b*
 hj ) {
 abcd25 ef_gh ( fg*_h hj_b* hj ) {
The input file can have multiple patterns which coincides with the start and end pattern of the required pattern. Thanks in advance for the replies.
输入文件可以有多个与所需模式的开始和结束模式一致的模式。预先感谢您的答复。
采纳答案by choroba
The regex does not match even the single line. What do you think the double parentheses do?
正则表达式甚至不匹配单行。你认为双括号有什么作用?
You probably wanted
你可能想要
m/^\s*(\w+)\s+(\w+?)\s*\([\w0-9,*\s]+\)\s{/gm
Update:The specification has changed. The regex has (almost) not, but you have to change the code slightly:
更新:规范已更改。正则表达式(几乎)没有,但您必须稍微更改代码:
perl -0777 -nle 'print "\n" while m/^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{)/gm'
Another update:
另一个更新:
Explanation:
解释:
- The switches are described in 
perlrun: zero, n, l, e The regex can be auto-explained by YAPE::Regex::Explain
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{)/)->explain' The regular expression: (?-imsx:^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to : ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w+? word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the least amount possible)) ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \( '(' ---------------------------------------------------------------------- [\w0-9,*\s]+ any character of: word characters (a-z, A-Z, 0-9, _), '0' to '9', ',', '*', whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \) ')' ---------------------------------------------------------------------- \s whitespace (\n, \r, \t, \f, and " ") ---------------------------------------------------------------------- { '{' ---------------------------------------------------------------------- ) end of ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------- The /gm switches are explained in perlre
 
- 开关描述为
perlrun:zero, n, l, e YAPE::Regex::Explain可以自动解释正则表达式
perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{)/)->explain' The regular expression: (?-imsx:^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{)) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of the string ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- ( group and capture to : ---------------------------------------------------------------------- \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \w+? word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the least amount possible)) ---------------------------------------------------------------------- \s* whitespace (\n, \r, \t, \f, and " ") (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \( '(' ---------------------------------------------------------------------- [\w0-9,*\s]+ any character of: word characters (a-z, A-Z, 0-9, _), '0' to '9', ',', '*', whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- \) ')' ---------------------------------------------------------------------- \s whitespace (\n, \r, \t, \f, and " ") ---------------------------------------------------------------------- { '{' ---------------------------------------------------------------------- ) end of ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------- /gm 开关在perlre 中有解释
 
回答by Todd A. Jacobs
Use the Flip-Flop Operator for a One-Liner
将触发器运算符用于 One-Liner
Perl makes this really easy with the flip-flop operator, which will allow you to print out all the lines between two regular expressions. For example:
Perl 使用触发器运算符使这变得非常容易,它允许您打印出两个正则表达式之间的所有行。例如:
$ perl -ne 'print if /^abcd25/ ... /\bhj \) {/' /tmp/foo
abcd25
ef_gh
( fg*_h
hj_b*
hj ) {
However, a simple one-liner like this won't differentiate between matches where you want to reject specific matches between the delimiting patterns. That calls for a more complex approach.
但是,像这样的简单单行不会区分您想要拒绝分隔模式之间的特定匹配的匹配。这需要更复杂的方法。
More Complicated Comparisons Benefit from Conditional Branching
更复杂的比较受益于条件分支
One-liners aren't always the best choice, and regular expressions can get out of hand quickly if they become too complex. In such situations, you're better off writing an actual program that can use conditional branching rather than trying to use an over-clever regular expression match.
单行并不总是最好的选择,如果正则表达式变得太复杂,它们很快就会失控。在这种情况下,最好编写一个可以使用条件分支的实际程序,而不是尝试使用过于聪明的正则表达式匹配。
One way to do this is to build up your match with a simplepattern, and then reject any match that doesn't match some othersimple pattern. For example:
一种方法是用一个简单的模式建立你的匹配,然后拒绝任何与其他简单模式不匹配的匹配。例如:
#!/usr/bin/perl -nw
# Use flip-flop operator to select matches.
if (/^abcd25/ ... /\bhj \) {/) {
    push @string, $_
};
# Reject multi-line patterns that don't include a particular expression
# between flip-flop delimiters. For example, "( fg" will match, while
# "^fg" won't.
if (/\bhj \) {/) {
    $string = join("", @string);
    undef @string;
    push(@matches, $string) if $string =~ /\( fg/;
};
END {print @matches}
When run against the OP's updated corpus, this correctly yields:
当针对 OP 的更新语料库运行时,这正确地产生:
abcd25
ef_gh
( fg*_h
hj_b*
hj ) {
abcd25 ef_gh ( fg*_h hj_b* hj ) {

