bash Perl one liner 提取多行模式

Question

提问by Gil

I have a pattern in a file as follows which can/cannot span over multiple lines :

我在文件中有一个模式，如下所示，它可以/不能跨越多行：

 abcd25
 ef_gh
 ( fg*_h
 hj_b*
 hj ) {

What I have tried :

我尝试过的：

perl -nle 'print while m/^\s*(\w+)\s+(\w+?)\s*(([\w-0-9,* \s]))\s{/gm'

perl -nle '打印时 m/^\s*(\w+)\s+(\w+?)\s*(([\w-0-9,* \s] ))\s{/gm'

I dont know what the flags mean here but all I did was write a regexfor the pattern and insert it in the pattern space .This matches well if the the pattern is in a single line as :

我不知道这里的标志是什么意思，但我所做的只是regex为模式编写一个并将其插入模式空间。如果模式在一行中，则匹配良好：

abcd25 ef_gh ( fg*_h hj_b* hj ) {

But fails exclusively in the multiline case !

但仅在多行情况下失败！

I started with perl yesterday but the syntax is way too confusing . So , as suggested by one of our fellow SO mate ,I wrote a regexand inserted it in the code provided by him .

我昨天开始使用 perl 但语法太混乱了。因此，按照我们的一位 SO 伙伴的建议，我写了一个regex并将其插入到他提供的代码中。

I hope a perlmonk can help me in this case . Alternative solutions are welcome .

perl在这种情况下，我希望有和尚可以帮助我。欢迎使用替代解决方案。

Input file :

输入文件：

 abcd25
 ef_gh
 ( fg*_h
 hj_b*
 hj ) {

 abcd25
 ef_gh
 fg*_h
 hj_b*
 hj ) {

 jhijdsiokdù ()lmolmlxjk;
 abcd25 ef_gh ( fg*_h hj_b* hj ) {

Expected output :

预期输出：

 abcd25
 ef_gh
 ( fg*_h
 hj_b*
 hj ) {
 abcd25 ef_gh ( fg*_h hj_b* hj ) {

The input file can have multiple patterns which coincides with the start and end pattern of the required pattern. Thanks in advance for the replies.

输入文件可以有多个与所需模式的开始和结束模式一致的模式。预先感谢您的答复。

Answer 1

采纳答案by choroba

The regex does not match even the single line. What do you think the double parentheses do?

正则表达式甚至不匹配单行。你认为双括号有什么作用？

You probably wanted

你可能想要

m/^\s*(\w+)\s+(\w+?)\s*\([\w0-9,*\s]+\)\s{/gm

Update:The specification has changed. The regex has (almost) not, but you have to change the code slightly:

更新：规范已更改。正则表达式（几乎）没有，但您必须稍微更改代码：

perl -0777 -nle 'print "\n" while m/^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{)/gm'

Another update:

另一个更新：

Explanation:

解释：

The switches are described in perlrun: zero, n, l, e

The regex can be auto-explained by YAPE::Regex::Explain

perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{)/)->explain'
The regular expression:

(?-imsx:^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to :
----------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \w+?                     word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the least amount
                             possible))
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \(                       '('
----------------------------------------------------------------------
    [\w0-9,*\s]+             any character of: word characters (a-z,
                             A-Z, 0-9, _), '0' to '9', ',', '*',
                             whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \)                       ')'
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    {                        '{'
----------------------------------------------------------------------
  )                        end of 
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

The /gm switches are explained in perlre

开关描述为perlrun：zero, n, l, e

YAPE::Regex::Explain可以自动解释正则表达式

perl -MYAPE::Regex::Explain -e 'print YAPE::Regex::Explain->new(qr/^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{)/)->explain'
The regular expression:

(?-imsx:^\s*(\w+\s+\w+?\s*\([\w0-9,*\s]+\)\s{))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
----------------------------------------------------------------------
  (                        group and capture to :
----------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \w+?                     word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the least amount
                             possible))
----------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \(                       '('
----------------------------------------------------------------------
    [\w0-9,*\s]+             any character of: word characters (a-z,
                             A-Z, 0-9, _), '0' to '9', ',', '*',
                             whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
----------------------------------------------------------------------
    \)                       ')'
----------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
    {                        '{'
----------------------------------------------------------------------
  )                        end of 
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

/gm 开关在perlre 中有解释

Answer 2

回答by Todd A. Jacobs

Use the Flip-Flop Operator for a One-Liner

将触发器运算符用于 One-Liner

Perl makes this really easy with the flip-flop operator, which will allow you to print out all the lines between two regular expressions. For example:

Perl 使用触发器运算符使这变得非常容易，它允许您打印出两个正则表达式之间的所有行。例如：

$ perl -ne 'print if /^abcd25/ ... /\bhj \) {/' /tmp/foo
abcd25
ef_gh
( fg*_h
hj_b*
hj ) {

However, a simple one-liner like this won't differentiate between matches where you want to reject specific matches between the delimiting patterns. That calls for a more complex approach.

但是，像这样的简单单行不会区分您想要拒绝分隔模式之间的特定匹配的匹配。这需要更复杂的方法。

More Complicated Comparisons Benefit from Conditional Branching

更复杂的比较受益于条件分支

One-liners aren't always the best choice, and regular expressions can get out of hand quickly if they become too complex. In such situations, you're better off writing an actual program that can use conditional branching rather than trying to use an over-clever regular expression match.

单行并不总是最好的选择，如果正则表达式变得太复杂，它们很快就会失控。在这种情况下，最好编写一个可以使用条件分支的实际程序，而不是尝试使用过于聪明的正则表达式匹配。

One way to do this is to build up your match with a simplepattern, and then reject any match that doesn't match some othersimple pattern. For example:

一种方法是用一个简单的模式建立你的匹配，然后拒绝任何与其他简单模式不匹配的匹配。例如：

#!/usr/bin/perl -nw

# Use flip-flop operator to select matches.
if (/^abcd25/ ... /\bhj \) {/) {
    push @string, $_
};

# Reject multi-line patterns that don't include a particular expression
# between flip-flop delimiters. For example, "( fg" will match, while
# "^fg" won't.
if (/\bhj \) {/) {
    $string = join("", @string);
    undef @string;
    push(@matches, $string) if $string =~ /\( fg/;
};

END {print @matches}

When run against the OP's updated corpus, this correctly yields:

当针对 OP 的更新语料库运行时，这正确地产生：

abcd25
ef_gh
( fg*_h
hj_b*
hj ) {
abcd25 ef_gh ( fg*_h hj_b* hj ) {

bash Perl one liner 提取多行模式

提问by Gil

采纳答案by choroba

回答by Todd A. Jacobs

Use the Flip-Flop Operator for a One-Liner

将触发器运算符用于 One-Liner

More Complicated Comparisons Benefit from Conditional Branching

更复杂的比较受益于条件分支

相关推荐

最近更新

标签

bash Perl one liner 提取多行模式

提问by Gil

采纳答案by choroba

回答by Todd A. Jacobs

Use the Flip-Flop Operator for a One-Liner

将触发器运算符用于 One-Liner

More Complicated Comparisons Benefit from Conditional Branching

更复杂的比较受益于条件分支

相关推荐

bash 如何使用 grep 或 awk 从文件中提取特定块

bash 如何使用循环从 .txt 文件中删除行？

如何在 Bash 中读取任意一个键？

bash 如何提取 XML 文件的特定元素？

相关推荐

最近更新

标签