用于提取正则表达式模式的所有匹配项的 bash 脚本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3643436/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 22:34:41  来源:igfitidea点击:

bash script to extract ALL matches of a regex pattern

bashshell

提问by Neeladri Vishweswaran

I found this but it assumes the words are space separated.

我找到了这个,但它假设单词是空格分隔的。

result="abcdefADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg"

for word in $result
do
    if echo $word | grep -qi '(ADDNAME\d\d.*HELLO)'
    then
        match="$match $word"
    fi
done

POST EDITED

后期编辑

Re-naming for clarity:

为清晰起见重新命名:

data="abcdefADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg"
for word in $data
do
    if echo $word | grep -qi '(ADDNAME\d\d.*HELLO)'
    then
        match="$match $word"
    fi
done
echo $match

Original left so comments asking about resultcontinue to make sense.

原始离开所以评论询问result继续有意义。

回答by Paused until further notice.

Edit: answer to edited question:

编辑:对已编辑问题的回答:

for string in "$(echo $result | grep -Po "ADDNAME[0-9]{2}.*?HELLO")"
do
    match="${match:+$match }$string"
done

Original answer:

原答案:

If you're using Bash version 3.2 or higher, you can use its regex matching.

如果您使用的是 Bash 3.2 或更高版本,则可以使用其正则表达式匹配。

string="string to search 99 with 88 some 42 numbers"
pattern="[0-9]{2}"
for word in $string
do
    [[ $word =~ $pattern ]]
    if [[ ${BASH_REMATCH[0]} ]]
    then
        match="${match:+match }${BASH_REMATCH[0]}"
    fi
done

The result will be "99 88 42".

结果将是“99 88 42”。

回答by Daenyth

Use grep -o

grep -o

-o, --only-matching show only the part of a line matching PATTERN

-o, --only-matching 只显示匹配 PATTERN 的行的一部分

回答by Jonathan Leffler

Not very elegant - and there are problems because of greedy matching - but this more or less works:

不是很优雅 - 由于贪婪匹配而存在问题 - 但这或多或少是有效的:

data="abcdefADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg"
for word in $data \
    "ADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg" \
    "ADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLO"
do
    echo $word
done |
sed -e '/ADDNAME[0-9][0-9][a-z]*HELLO/{
        s/\(ADDNAME[0-9][0-9][a-z]*HELLO\)/  /g
        }' |
while read line
do
    set -- $line
    for arg in "$@"
    do echo $arg
    done
done |
grep "ADDNAME[0-9][0-9][a-z]*HELLO"

The first loop echoes three lines of data - you'd probably replace that with cator I/O redirection. The sedscript uses a modified regex to put spaces around the patterns. The last loop breaks up the 'space separated words' into one 'word' per line. The final grepselects the lines you want.

第一个循环回显三行数据 - 您可能cat会将其替换为或 I/O 重定向。该sed脚本使用修改后的正则表达式在模式周围放置空格。最后一个循环将“空格分隔的单词”分解为每行一个“单词”。最后grep选择你想要的行。

The regex is modified with [a-z]*in place of the original .*because the pattern matching is greedy. If the data between ADDNAME and HELLO is unconstrained, then you need to think about using non-greedy regexes, which are available in Perl and probably Python and other modern scripting languages:

因为模式匹配是贪婪的,所以正则表达式被修改为[a-z]*代替原来.*的。如果 ADDNAME 和 HELLO 之间的数据不受约束,那么您需要考虑使用非贪婪的正则表达式,这些正则表达式在 Perl 中可用,可能还有 Python 和其他现代脚本语言:

#!/bin/perl -w
while (<>)
{
    while (/(ADDNAME\d\d.*?HELLO)/g)
    {
        print "\n";
    }
}

This is a good demonstration of using the right too for the job.

这是在工作中也使用权利的一个很好的示范。