Bash 正则表达式条件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5186292/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 23:33:21  来源:igfitidea点击:

Bash Regular Expression Condition

regexbashconditional

提问by jayem

I have a regular expression that I need to verify. The regular expression has double quotes in it, but I can't seem to figure out how to properly escape them.

我有一个需要验证的正则表达式。正则表达式中有双引号,但我似乎无法弄清楚如何正确转义它们。

First attempt, doesn't work as the quotes are not escaped.

第一次尝试不起作用,因为引号没有被转义。

while read line
do
  if [[ $line =~ "<a href="(.+)">HTTP</a>" ]]; then
    SOURCE=${BASH_REMATCH[1]}
    break
  fi
done < tmp/source.html

echo "{$SOURCE}" #output = {"link.html"} (with double quotes)

How can I properly run this so the output is link.htmlwithout double quotes.

我怎样才能正确运行它,以便输出是不带双引号的link.html

I have tried...

我试过了...

while read line
do
  if [[ $line =~ "<a href=/"(.+)/">HTTP</a>" ]]; then
    SOURCE=${BASH_REMATCH[1]}
    break
  fi
done < tmp/source.html

echo "{$SOURCE}" #output = {}

Without luck. Can someone please help me so I can stop beating my head on my desk? I am not great with Bash. Thank you!

没有运气。有人可以帮助我,这样我就可以停止在我的桌子上敲我的头了吗?我不擅长 Bash。谢谢!

回答by Paused until further notice.

It's always best to put your regex in a variable.

最好将正则表达式放在变量中。

pattern='<a href="(.+)">HTTP</a>'
while read line
do
  if [[ $line =~ $pattern ]]; then
    SOURCE=${BASH_REMATCH[1]}
    break
  fi
done < tmp/source.html

echo "{$SOURCE}" #output = {link.html} (without double quotes)

If you quote the right hand side (the pattern), it changes the match from regex to a simple string equal (=~effectively becomes ==).

如果引用右侧(模式),它会将匹配从正则表达式更改为一个简单的字符串 equal(=~实际上变为==)。

As a side note, escaping is done with backslashes (\) rather than slashes (/), but that would not help your situation because of the outer quotes as mentioned in my previous paragraph.

作为旁注,转义是使用反斜杠 ( \) 而不是斜杠 ( /) 完成的,但这对您的情况无济于事,因为我在上一段中提到了外部引号。

回答by Satyajit

$line =~ "<a href=\"(.+)\">HTTP</a>" 

回答by zhigang

I recommend always use a variable when specifying the regex:

我建议在指定正则表达式时始终使用变量:

#!/bin/bash

SOURCE=
url_re='<a href="(.+)">HTTP</a>'
while read line
do
    if [[ "$line" =~ $url_re ]]; then
        SOURCE=${BASH_REMATCH[1]}
        break
    fi
done < test.txt

echo $SOURCE # http://example.com/

# test.txt contents:
# <a href="http://example.com/">HTTP</a>

回答by yong321

Without an intermediate variable (i.e. use the regex directly after =~), it works only if the regex pattern doesn't have certain characters (space, < or >, etc.) and you remove the quotes around the regex, or if the regex is a plain alphanumeric string

如果没有中间变量(即在 =~ 之后直接使用正则表达式),它仅在正则表达式模式没有某些字符(空格、< 或 > 等)并且您删除正则表达式周围的引号时才有效,或者regex 是一个普通的字母数字字符串

$ x='Hello'
$ [[ $x =~ ^H ]] && echo OK
OK
$ [[ $x =~ 'H' ]] && echo OK
OK
$ [[ $x =~ H ]] && echo OK
OK

I stumbled across this page while looking for an explanation on the design of bash that generally doesn't allow you to use regex directly after =~. For example

我在寻找有关 bash 设计的解释时偶然发现了此页面,该解释通常不允许您在 =~ 之后直接使用正则表达式。例如

$ re='^H'
$ [[ $x =~ $re ]] && echo OK
OK

works as expected, while

按预期工作,而

$ [[ $x =~ '^H' ]] && echo OK

does not. I personally always put the regex in a variable first. But I still wonder why bash is designed this way. You can argue assigning the regex to a variable first would overall make the code look neater. Any other reason? If a regex is not supposed to be interpreted as a string, bash could use other ways to represent it. For example, Perl uses slashes, /regex/, or more explicitly m/regex/.

才不是。我个人总是首先将正则表达式放在变量中。但我仍然想知道为什么 bash 是这样设计的。您可以争辩说首先将正则表达式分配给变量会使代码看起来更整洁。还有什么原因吗?如果不应该将正则表达式解释为字符串,则 bash 可以使用其他方式来表示它。例如,Perl 使用斜杠、/regex/ 或更明确地使用 m/regex/。

回答by yong321

Try this "<a href="""(.+)""">HTTP</a>"

尝试这个 "<a href="""(.+)""">HTTP</a>"

Edit, well try this

编辑,试试这个

"<a href="\""(.+)"\"">HTTP</a>"

"<a href="\""(.+)"\"">HTTP</a>"

or

或者

'<a href="(.+)">HTTP</a>'

'<a href="(.+)">HTTP</a>'

or

或者

'<a href='\"'(.+)'\"'>HTTP</a>'<-- this will give the right syntax in Bash, as for the regex (.+), don't know how that will play

'<a href='\"'(.+)'\"'>HTTP</a>'<-- 这将在 Bash 中给出正确的语法,至于正则表达式 (.+),不知道它会如何播放

Edit, what do you get when you use this regex "<a href=(.+)>HTTP</a>"??

编辑,当你使用这个正则表达式时你会得到什么"<a href=(.+)>HTTP</a>"