Bash 正则表达式捕获组

Question

提问by mhaken

I have a single string that is this kind of format:

我有一个这种格式的字符串：

"Mike H<[email protected]>" [email protected] "Mike H<[email protected]>"

If I was writing a normal regex in JS, C#, etc, I'd do this

如果我在 JS、C# 等中编写一个普通的正则表达式，我会这样做

(?:"(.+?)"|'(.+?)'|(\S+))

And iterate the match groups to grab each string, ideally without the quotes. I ultimately want to add each value to an array, so in the example, I'd end up with 3 items in an array as follows:

并迭代匹配组以获取每个字符串，理想情况下没有引号。我最终想将每个值添加到一个数组中，因此在示例中，我最终会在数组中包含 3 个项目，如下所示：

Mike H<[email protected]>
[email protected] 
Mike H<[email protected]>

I can't figure out how to replicate this functionality with grepor sedor bash regex's. I've tried some things like

我不知道如何使用greporsed或 bash 正则表达式复制此功能。我尝试过一些类似的事情

echo "$email" | grep -oP "\"\K(.+?)(?=\")|'\K(.+?)(?=')|(\S+)"

The problem with this is that while it kind of mimics the functionality of capture groups, it doesn't really work with multiples, so I get captures like

问题在于，虽然它有点模仿捕获组的功能，但它并不真正适用于倍数，所以我得到了类似的捕获

"Mike
H<[email protected]>"
 [email protected]

If I remove the look ahead/behind logic, I at least get the 3 strings, but the first and last are still wrapped in quotes. In that approach, I pipe the output to readso I can individually add each string to the array, but I'm open to other options.

如果我删除前瞻/后视逻辑，我至少会得到 3 个字符串，但第一个和最后一个仍然用引号括起来。在这种方法中，我将输出通过管道传输到，read以便我可以将每个字符串单独添加到数组中，但我对其他选项持开放态度。

EDIT:

编辑：

I think my input example may have been confusing, it's just a possible input. The real input could be double quoted, single quoted, or non-quoted (without spaces) strings in any order with any quantity. The Javascript/C# regex I provided is the real behavior I'm trying to achieve.

我认为我的输入示例可能令人困惑，这只是一个可能的输入。实际输入可以是任意数量的任意顺序的双引号、单引号或非引号（无空格）字符串。我提供的 Javascript/C# 正则表达式是我试图实现的真实行为。

Answer 1

采纳答案by mhaken

What I was able to do that worked, but wasn't as concise as I wanted the code to be:

我能够做的事情有效，但没有我想要的代码那么简洁：

arr=()
while read line; do
  line="${line//\"/}"
  arr+=("${line//\'/}")
done < <(echo $email | grep -oP "\"(.+?)\"|'(.+?)'|(\S+)")

This gave me an array of the capturing group and handled the input in any order, wrapped in double or single quotes or none at all if it didn't have a space. It also provided the elements in the array without the wrapping quotes. Appreciate all of the suggestions.

这给了我一个捕获组的数组，并以任何顺序处理输入，用双引号或单引号包裹，如果没有空格，则根本不包裹。它还提供了数组中没有环绕引号的元素。欣赏所有的建议。

Answer 2

回答by dawg

You can use Perl:

你可以使用 Perl：

$ email='"Mike H<[email protected]>" [email protected] "Mike H<[email protected]>"'
$ echo "$email" | perl -lane 'while (/"([^"]+)"|(\S+)/g) {print  ?  : }' 
Mike H<[email protected]>
[email protected]
Mike H<[email protected]>

Or in pure Bash, it gets kinda wordy:

或者在纯 Bash 中，它有点罗嗦：

re='\"([^\"]+)\"[[:space:]]*|([^[:space:]]+)[[:space:]]*'
while [[ $email =~ $re ]]; do
    echo ${BASH_REMATCH[1]}${BASH_REMATCH[2]}
    i=${#BASH_REMATCH}
    email=${email:i}
done 
# same output

Answer 3

回答by JJoao

Your first expression is fine; just be careful with the quotes (use single quotes when \are present). In the end trim the "with sed.

你的第一个表情很好；请注意引号（当\存在时使用单引号）。最后"用 sed修剪。

$ echo $mail | grep -Po '".*?"|\S+' | sed -r 's/"$|^"//g'
Mike H<[email protected]>
[email protected]
Mike H<[email protected]>

Answer 4

回答by RomanPerekhrest

gawk+ bashsolution (adding each item to array):

gawk+ bash解决方案（将每个项目添加到数组）：

email_str='"Mike H<[email protected]>" [email protected] "Mike H<[email protected]>"'

readarray -t email_arr < <(awk -v FPAT="[^\"'[:space:]]+[^\"']+[^\"'[:space:]]+" \
                         '{ for(i=1;i<=NF;i++) print $i }' <<<$email_str)

Now, all items are in email_arr

现在，所有物品都在 email_arr

Accessing the 2nd item:

访问第二项：

echo "${email_arr[1]}"
[email protected]

Accessing the 3rd item:

访问第三项：

echo "${email_arr[3]}"
Mike H<[email protected]>

Answer 5

回答by James Brown

Using GNU awk and FPATto define fields by content:

使用GNU AWK，并FPAT以按内容定义字段：

$ awk '
BEGIN { FPAT="([^ ]*)|(\"[^\"]*\")" }  # define a field to be space-separated or in quotes
{
    for(i=1;i<=NF;i++) {               # iterate every field
        gsub(/^\"|\"$/,"",$i)          # remove leading and trailing quotes
        print $i                       # output
    }
}' file
Mike H<[email protected]>
[email protected]
Mike H<[email protected]>

Answer 6

回答by CWLiu

You may use sedto achieve that,

你可以sed用来实现这一目标，

$ sed -r 's/"(.*)" (.*)"(.*)"/\n\n/g' <<< "$EMAIL"
Mike H<[email protected]>
[email protected] 
Mike H<[email protected]>

Answer 7

回答by P....

Using gawkwhere you can set multi-line RS.

使用gawk您可以设置多行的地方RS。

awk -v RS='"|" ' 'NF' inputfile
Mike H<[email protected]>
[email protected]
Mike H<[email protected]>

Answer 8

回答by Rahul Verma

Modify your regex like this :

像这样修改你的正则表达式：

grep -oP '("?\s*)\K.*?(?=")' file

Output:

输出：

Mike H<[email protected]>
[email protected]
Mike H<[email protected]>

Bash 正则表达式捕获组

提问by mhaken

采纳答案by mhaken

回答by dawg

回答by JJoao

回答by RomanPerekhrest

回答by James Brown

回答by CWLiu

回答by P....

回答by Rahul Verma

相关推荐

最近更新

标签

Bash 正则表达式捕获组

提问by mhaken

采纳答案by mhaken

回答by dawg

回答by JJoao

回答by RomanPerekhrest

回答by James Brown

回答by CWLiu

回答by P....

回答by Rahul Verma

相关推荐

"set - $VARIABLE" 在 bash 中是什么意思？

如何处理具有多个选项的多个参数的 bash

bash 火花提交：找不到命令

bash curl - 如何在登录后获取 cookie 以发送 curl 命令？

相关推荐

最近更新

标签