bash 在bash中的字符串之间获取字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13946066/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 04:02:51  来源:igfitidea点击:

Get string between strings in bash

bashsed

提问by Robby75

I want to get the string between <sometag param='and '>

我想得到<sometag param='和之间的字符串'>

I tried to use the method from Get any string between 2 string and assign a variable in bashto get the "x":

我尝试使用Get any string between 2 string 中的方法并在 bash 中分配一个变量来获取“x”:

 echo "<sometag param='x'><irrelevant stuff='nonsense'>" | tr "'" _ | sed -n 's/.*<sometag param=_\(.*\)_>.*//p'

The problem (apart from low efficiency because I just cannot manage to escape the apostrophe correctly for sed) is that sed matches the maximum, i.e. the output is:

问题(除了效率低,因为我无法正确地为 sed 转义撇号)是 sed 匹配最大值,即输出是:

 x_><irrelevant stuff=_nonsense

but the correct output would be the minimum-match, in this example just "x"

但正确的输出将是最小匹配,在这个例子中只是“x”

Thanks for your help

谢谢你的帮助

回答by Steve

You are probably looking for something like this:

你可能正在寻找这样的东西:

sed -n "s/.*<sometag param='\([^']*\)'>.*//p"

Test:

测试:

echo "<sometag param='x'><irrelevant stuff='nonsense'>" | sed -n "s/.*<sometag param='\([^']*\)'>.*//p"

Results:

结果:

x

Explanation:

解释:

  • Instead of a greedy capture, use a non-greedy capture like: [^']*which means match anything except 'any number of times. To make the pattern stick, this is followed by: '>.
  • You can also use double quotes so that you don't need to escape the single quotes. If you wanted to escape the single quotes, you'd do this:
  • 使用非贪婪捕获代替贪婪捕获,例如:[^']*这意味着匹配除'任意次数之外的任何内容。为了使图案粘住,后面跟着:'>
  • 您还可以使用双引号,这样就不需要转义单引号。如果你想转义单引号,你可以这样做:

-

——

... | sed -n 's/.*<sometag param='\''\([^'\'']*\)'\''>.*//p'

Notice how that the single quotes aren't really escaped. The sedexpression is stopped, an escaped single quote is inserted and the sedexpression is re-opened. Think of it like a four character escape sequence.

请注意单引号并没有真正转义。该sed表达停止,转义单引号插入和sed表达是重开。把它想象成一个四字符的转义序列。



Personally, I'd use GNU grep. It would make for a slightly shorter solution. Run like:

就个人而言,我会使用GNU grep. 这将是一个稍微短一点的解决方案。运行如下:

... | grep -oP "(?<=<sometag param=').*?(?='>)"

Test:

测试:

echo "<sometag param='x'><irrelevant stuff='nonsense'>" | grep -oP "(?<=<sometag param=').*?(?='>)"

Results:

结果:

x

回答by aktivb

You don't have to assemble regexes in those cases, you can just use ' as the field separator

在这些情况下,您不必组装正则表达式,您只需使用 ' 作为字段分隔符

in="<sometag param='x'><irrelevant stuff='nonsense'>"

IFS="'" read x whatiwant y <<< "$in"            # bash
echo "$whatiwant"

awk -F\' '{print }' <<< "$in"                 # awk