Bash 正则表达式中的可选参数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8718851/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Optional parameters in Bash regular expression
提问by Florian Feldhaus
I want to parse strings similar to the following into separate variables using regular expressions from within Bash:
我想使用 Bash 中的正则表达式将类似于以下内容的字符串解析为单独的变量:
Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";
or
或者
Category: resource;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Resource";rel="http://schemas.ogf.org/occi/core#entity";attributes="occi.core.summary";
The first part before "title" is common to all strings, the parts title and attributes are optional.
“title”之前的第一部分对所有字符串都是通用的,部分标题和属性是可选的。
I managed to extract the mandatory parameters common to all strings, but I have trouble with optional parameters not necessarily present for all strings. As far as I found out, Bash doesn't support Non-capturing parentheses which I would use for this purpose.
我设法提取了所有字符串通用的强制性参数,但我遇到了不一定适用于所有字符串的可选参数的问题。据我所知,Bash 不支持我将用于此目的的非捕获括号。
Here is what I achieved thus far:
这是我迄今为止取得的成就:
CATEGORY_REGEX='Category:\s*([^;]*);scheme="([^"]*)";class="([^"]*)";'
category_string='Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";'
[[ $category_string =~ $CATEGORY_REGEX ]]
echo ${BASH_REMATCH[0]}
echo ${BASH_REMATCH[1]}
echo ${BASH_REMATCH[2]}
echo ${BASH_REMATCH[3]}
The regular expression I would like to use (and which is working for me in Ruby) would be:
我想使用的正则表达式(在 Ruby 中对我有用)是:
CATEGORY_REGEX='Category:\s*([^;]*);\s*scheme="([^"]*)";\s*class="([^"]*)";\s*(?:title="([^"]*)";)?\s*(?:rel="([^"]*)";)?\s*(?:location="([^"]*)";)?\s*(?:attributes="([^"]*)";)?\s*(?:actions="([^"]*)";)?'
Is there any other solution to parse the string with command line tools without having to fall back on perl, python or ruby?
有没有其他解决方案可以使用命令行工具解析字符串而不必依赖 perl、python 或 ruby?
回答by Andrew Clark
I don't think non-capturing groups exist in bash regex, so your options are to use a scripting language or to remove the ?:from all of the (?:...)groups and just be careful about which groups you reference, for example:
我认为 bash 正则表达式中不存在非捕获组,因此您的选择是使用脚本语言或?:从所有(?:...)组中删除 ,并注意您引用的组,例如:
CATEGORY_REGEX='Category:\s*([^;]*);\s*scheme="([^"]*)";\s*class="([^"]*)";\s*(title="([^"]*)";)?\s*(rel="([^"]*)";)?\s*(location="([^"]*)";)?\s*(attributes="([^"]*)";)?\s*(actions="([^"]*)";)?'
category_string='Category: entity;scheme="http://schemas.ogf.org/occi/core#";class="kind";title="Entity";attributes="occi.core.id occi.core.title";'
[[ $category_string =~ $CATEGORY_REGEX ]]
echo "full: ${BASH_REMATCH[0]}"
echo "category: ${BASH_REMATCH[1]}"
echo "scheme: ${BASH_REMATCH[2]}"
echo "class: ${BASH_REMATCH[3]}"
echo "title: ${BASH_REMATCH[5]}"
echo "rel: ${BASH_REMATCH[7]}"
echo "location: ${BASH_REMATCH[9]}"
echo "attributes: ${BASH_REMATCH[11]}"
echo "actions: ${BASH_REMATCH[13]}"
Note that starting with the optional parameters we need to skip a group each time, because the even numbered groups from 4 on contain the parameter name as well as the value (if the parameter is present).
请注意,从可选参数开始,我们每次都需要跳过一个组,因为从 4 开始的偶数组包含参数名称和值(如果参数存在)。
回答by user123
You can emulate non-matching groups in bash using a little bit of regexp magic:
您可以使用一点正则表达式魔法在 bash 中模拟不匹配的组:
_2__ _4__ _5__
[[ "fu@k" =~ ((.+)@|)((.+)/|)(.+) ]];
echo "${BASH_REMATCH[2]:--} ${BASH_REMATCH[4]:--} ${BASH_REMATCH[5]:--}"
# Output: fu - k
Characters @and /are parts of string we parse.
Regexp pipe |is used for either left or right (empty) part matching.
字符@和/是我们解析的字符串的一部分。正则表达式管道|用于左或右(空)部分匹配。
For curious, ${VAR:-<default value>}is variable expansion with default value in case $VAR is empty.
出于好奇,${VAR:-<default value>}如果 $VAR 为空,变量扩展是否具有默认值。

