bash 在bash中的字符串之间获取字符串
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13946066/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Get string between strings in bash
提问by Robby75
I want to get the string between <sometag param='and '>
我想得到<sometag param='和之间的字符串'>
I tried to use the method from Get any string between 2 string and assign a variable in bashto get the "x":
我尝试使用Get any string between 2 string 中的方法并在 bash 中分配一个变量来获取“x”:
echo "<sometag param='x'><irrelevant stuff='nonsense'>" | tr "'" _ | sed -n 's/.*<sometag param=_\(.*\)_>.*//p'
The problem (apart from low efficiency because I just cannot manage to escape the apostrophe correctly for sed) is that sed matches the maximum, i.e. the output is:
问题(除了效率低,因为我无法正确地为 sed 转义撇号)是 sed 匹配最大值,即输出是:
x_><irrelevant stuff=_nonsense
but the correct output would be the minimum-match, in this example just "x"
但正确的输出将是最小匹配,在这个例子中只是“x”
Thanks for your help
谢谢你的帮助
回答by Steve
You are probably looking for something like this:
你可能正在寻找这样的东西:
sed -n "s/.*<sometag param='\([^']*\)'>.*//p"
Test:
测试:
echo "<sometag param='x'><irrelevant stuff='nonsense'>" | sed -n "s/.*<sometag param='\([^']*\)'>.*//p"
Results:
结果:
x
Explanation:
解释:
- Instead of a greedy capture, use a non-greedy capture like:
[^']*which means match anything except'any number of times. To make the pattern stick, this is followed by:'>. - You can also use double quotes so that you don't need to escape the single quotes. If you wanted to escape the single quotes, you'd do this:
- 使用非贪婪捕获代替贪婪捕获,例如:
[^']*这意味着匹配除'任意次数之外的任何内容。为了使图案粘住,后面跟着:'>。 - 您还可以使用双引号,这样就不需要转义单引号。如果你想转义单引号,你可以这样做:
-
——
... | sed -n 's/.*<sometag param='\''\([^'\'']*\)'\''>.*//p'
Notice how that the single quotes aren't really escaped. The sedexpression is stopped, an escaped single quote is inserted and the sedexpression is re-opened. Think of it like a four character escape sequence.
请注意单引号并没有真正转义。该sed表达停止,转义单引号插入和sed表达是重开。把它想象成一个四字符的转义序列。
Personally, I'd use GNU grep. It would make for a slightly shorter solution. Run like:
就个人而言,我会使用GNU grep. 这将是一个稍微短一点的解决方案。运行如下:
... | grep -oP "(?<=<sometag param=').*?(?='>)"
Test:
测试:
echo "<sometag param='x'><irrelevant stuff='nonsense'>" | grep -oP "(?<=<sometag param=').*?(?='>)"
Results:
结果:
x
回答by aktivb
You don't have to assemble regexes in those cases, you can just use ' as the field separator
在这些情况下,您不必组装正则表达式,您只需使用 ' 作为字段分隔符
in="<sometag param='x'><irrelevant stuff='nonsense'>"
IFS="'" read x whatiwant y <<< "$in" # bash
echo "$whatiwant"
awk -F\' '{print }' <<< "$in" # awk

