用于提取正则表达式模式的所有匹配项的 bash 脚本
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3643436/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
bash script to extract ALL matches of a regex pattern
提问by Neeladri Vishweswaran
I found this but it assumes the words are space separated.
我找到了这个,但它假设单词是空格分隔的。
result="abcdefADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg"
for word in $result
do
if echo $word | grep -qi '(ADDNAME\d\d.*HELLO)'
then
match="$match $word"
fi
done
POST EDITED
后期编辑
Re-naming for clarity:
为清晰起见重新命名:
data="abcdefADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg"
for word in $data
do
if echo $word | grep -qi '(ADDNAME\d\d.*HELLO)'
then
match="$match $word"
fi
done
echo $match
Original left so comments asking about resultcontinue to make sense.
原始离开所以评论询问result继续有意义。
回答by Paused until further notice.
Edit: answer to edited question:
编辑:对已编辑问题的回答:
for string in "$(echo $result | grep -Po "ADDNAME[0-9]{2}.*?HELLO")"
do
match="${match:+$match }$string"
done
Original answer:
原答案:
If you're using Bash version 3.2 or higher, you can use its regex matching.
如果您使用的是 Bash 3.2 或更高版本,则可以使用其正则表达式匹配。
string="string to search 99 with 88 some 42 numbers"
pattern="[0-9]{2}"
for word in $string
do
[[ $word =~ $pattern ]]
if [[ ${BASH_REMATCH[0]} ]]
then
match="${match:+match }${BASH_REMATCH[0]}"
fi
done
The result will be "99 88 42".
结果将是“99 88 42”。
回答by Daenyth
Use grep -o
用 grep -o
-o, --only-matching show only the part of a line matching PATTERN
-o, --only-matching 只显示匹配 PATTERN 的行的一部分
回答by Jonathan Leffler
Not very elegant - and there are problems because of greedy matching - but this more or less works:
不是很优雅 - 由于贪婪匹配而存在问题 - 但这或多或少是有效的:
data="abcdefADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg"
for word in $data \
"ADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLOabcdefg" \
"ADDNAME25abcdefgHELLOabcdefgADDNAME25abcdefgHELLO"
do
echo $word
done |
sed -e '/ADDNAME[0-9][0-9][a-z]*HELLO/{
s/\(ADDNAME[0-9][0-9][a-z]*HELLO\)/ /g
}' |
while read line
do
set -- $line
for arg in "$@"
do echo $arg
done
done |
grep "ADDNAME[0-9][0-9][a-z]*HELLO"
The first loop echoes three lines of data - you'd probably replace that with cator I/O redirection. The sedscript uses a modified regex to put spaces around the patterns. The last loop breaks up the 'space separated words' into one 'word' per line. The final grepselects the lines you want.
第一个循环回显三行数据 - 您可能cat会将其替换为或 I/O 重定向。该sed脚本使用修改后的正则表达式在模式周围放置空格。最后一个循环将“空格分隔的单词”分解为每行一个“单词”。最后grep选择你想要的行。
The regex is modified with [a-z]*in place of the original .*because the pattern matching is greedy. If the data between ADDNAME and HELLO is unconstrained, then you need to think about using non-greedy regexes, which are available in Perl and probably Python and other modern scripting languages:
因为模式匹配是贪婪的,所以正则表达式被修改为[a-z]*代替原来.*的。如果 ADDNAME 和 HELLO 之间的数据不受约束,那么您需要考虑使用非贪婪的正则表达式,这些正则表达式在 Perl 中可用,可能还有 Python 和其他现代脚本语言:
#!/bin/perl -w
while (<>)
{
while (/(ADDNAME\d\d.*?HELLO)/g)
{
print "\n";
}
}
This is a good demonstration of using the right too for the job.
这是在工作中也使用权利的一个很好的示范。

