bash 在bash中使用正则表达式在字符串中进行多个匹配

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11565489/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 02:48:57  来源:igfitidea点击:

Multiple matches in a string using regex in bash

bashbash4

提问by pn1 dude

Been looking for some more advanced regex info on regex with bash and have not found much information on it.

一直在使用 bash 寻找有关 regex 的一些更高级的 regex 信息,但没有找到太多信息。

Here's the concept, with a simple string:

这是概念,带有一个简单的字符串:

myString="DO-BATCH BATCH-DO"

if [[ $myString =~ ([[:alpha:]]*)-([[:alpha:]]*) ]]; then
 echo ${BASH_REMATCH[1]} #first perens
 echo ${BASH_REMATCH[2]} #second perens
 echo ${BASH_REMATCH[0]} #full match
fi

outputs:
BATCH
DO
DO-BATCH

So fine it does the first match (BATCH-DO) but how do I pull a second match (DO-BATCH)? I'm just drawing a blank here and can not find much info on bash regex.

第一场比赛(BATCH-DO)很好,但我如何拉第二场比赛(DO-BATCH)?我只是在这里画了一个空白,找不到关于 bash regex 的太多信息。

回答by pn1 dude

OK so one way I did this is to put it in a for loop:

好的,我这样做的一种方法是将其放入 for 循环中:

myString="DO-BATCH BATCH-DO"
for aString in ${myString[@]}; do
    if [[ ${aString} =~ ([[:alpha:]]*)-([[:alpha:]]*) ]]; then
     echo ${BASH_REMATCH[1]} #first perens
     echo ${BASH_REMATCH[2]} #second perens
     echo ${BASH_REMATCH[0]} #full match
    fi
done

which outputs:
DO
BATCH
DO-BATCH
BATCH
DO
BATCH-DO

Which works but I kind of was hoping to pull it all from one regex if possible.

哪个有效,但我有点希望尽可能从一个正则表达式中提取所有内容。

回答by Paused until further notice.

In your answer, myStringis not an array, but you use an array reference to access it. This works in Bash because the 0th element of an array can be referred to by just the variable name and vice versa. What that means is that you could use:

在您的回答中,myString不是数组,而是使用数组引用来访问它。这在 Bash 中有效,因为数组的第 0 个元素可以仅通过变量名称引用,反之亦然。这意味着您可以使用:

for aString in $myString; do

to get the same result in this case.

在这种情况下获得相同的结果。

In your question, you say the output includes "BATCH-DO". I get "DO-BATCH" so I presume this was a typo.

在您的问题中,您说输出包括“BATCH-DO”。我得到“DO-BATCH”,所以我认为这是一个错字。

The only way to get the extra strings without using a forloop is to use a longer regex. By the way, I recommend putting Bash regexes in variable. It makes certain types much easier to use (those the contain whitespace or special characters, for example.

在不使用for循环的情况下获取额外字符串的唯一方法是使用更长的正则表达式。顺便说一下,我建议将 Bash 正则表达式放在变量中。它使某些类型更易于使用(例如那些包含空格或特殊字符的类型。

pattern='(([[:alpha:]]*)-([[:alpha:]]*)) +(([[:alpha:]]*)-([[:alpha:]]*))'
[[ $myString =~ $pattern ]]
declare -p BASH_REMATCH    #dump the array

Outputs:

输出:

declare -ar BASH_REMATCH='([0]="DO-BATCH BATCH-DO" [1]="DO-BATCH" [2]="DO" [3]="BATCH" [4]="BATCH-DO" [5]="BATCH" [6]="DO")'

The extra set of parentheses is needed if you want to capture the individual substrings as well as the hyphenated phrases. If you don't need the individual words, you can eliminate the inner sets of parentheses.

如果要捕获单个子字符串以及带连字符的短语,则需要额外的一组括号。如果不需要单个单词,则可以消除内部括号。

Notice that you don't need to use ifif you only need to extract substrings. You only need ifto take conditional action based on a match.

请注意,if如果您只需要提取子字符串,则不需要使用。您只需要if根据匹配项采取有条件的操作。

Also notice that ${BASH_REMATCH[0]}will be quite different with the longer regex since it contains the whole match.

另请注意,${BASH_REMATCH[0]}较长的正则表达式将完全不同,因为它包含整个匹配项。

回答by pn1 dude

Per @Dennis Williamson's post I messed around and ended up with the following:

根据@Dennis Williamson 的帖子,我搞砸了,结果如下:

myString="DO-BATCH BATCH-DO" 
pattern='(([[:alpha:]]*)-([[:alpha:]]*)) +(([[:alpha:]]*)-([[:alpha:]]*))'

[[ $myString =~ $pattern ]] && { read -a myREMatch <<< ${BASH_REMATCH[@]}; }

echo "${myString} -> ${myString}" 
echo "${#myREMatch[@]} -> ${#myREMatch[@]}"

for (( i = 0; i < ${#myREMatch[@]}; i++ )); do   
  echo "${myREMatch[$i]} -> ${myREMatch[$i]}" 
done

This works fine except myString must have the 2 values to be there. So I post this because its is kinda interesting and I had fun messing with it. But to get this more generic and address any amount of paired groups (ie DO-BATCH) I'm going to go with a modified version of my original answer:

这工作正常,除了 myString 必须有 2 个值。所以我发布这个是因为它有点有趣而且我玩得很开心。但是为了使这个更通用并解决任何数量的配对组(即 DO-BATCH),我将使用原始答案的修改版本:

myString="DO-BATCH BATCH-DO" 
myRE="([[:alpha:]]*)-([[:alpha:]]*)"

read -a myString <<< $myString

for aString in ${myString[@]}; do   
  echo "${aString} -> ${aString}"  
  if [[ ${aString} =~ ${myRE} ]]; then
    echo "${BASH_REMATCH[@]} -> ${BASH_REMATCH[@]}"
    echo "${#BASH_REMATCH[@]} -> ${#BASH_REMATCH[@]}"
    for (( i = 0; i < ${#BASH_REMATCH[@]}; i++ )); do
      echo "${BASH_REMATCH[$i]} -> ${BASH_REMATCH[$i]}"
    done
  fi
done

I would have liked a perlre like multiple match but this works fine.

我本来希望像多个匹配这样的 perlre ,但这工作正常。

回答by David

Although this is a year old question (without accepted answer), could the regex pattern be simplified to:

虽然这是一个老问题(没有公认的答案),但正则表达式模式是否可以简化为:

myRE="([[:alpha:]]*-[[:alpha:]]*)"

by removing the inner parenthesis to find a smaller (more concise) set of the words DO-BATCHand BATCH-DO?

通过删除内括号来找到更小(更简洁)的单词集DO-BATCHBATCH-DO?

It works for me in you 18:10 time answer. ${BASH_REMATCH[0]} and ${BASH_REMATCH[1]} result in the 2 words being found.

它在你 18:10 的时间回答中对我有用。${BASH_REMATCH[0]} 和 ${BASH_REMATCH[1]} 导致找到 2 个单词。