bash 在bash中的字符串之间获取字符串

Question

提问by Robby75

I want to get the string between <sometag param='and '>

我想得到<sometag param='和之间的字符串'>

I tried to use the method from Get any string between 2 string and assign a variable in bashto get the "x":

我尝试使用Get any string between 2 string 中的方法并在 bash 中分配一个变量来获取“x”：

 echo "<sometag param='x'><irrelevant stuff='nonsense'>" | tr "'" _ | sed -n 's/.*<sometag param=_\(.*\)_>.*//p'

The problem (apart from low efficiency because I just cannot manage to escape the apostrophe correctly for sed) is that sed matches the maximum, i.e. the output is:

问题（除了效率低，因为我无法正确地为 sed 转义撇号）是 sed 匹配最大值，即输出是：

 x_><irrelevant stuff=_nonsense

but the correct output would be the minimum-match, in this example just "x"

但正确的输出将是最小匹配，在这个例子中只是“x”

Thanks for your help

谢谢你的帮助

Answer 1

回答by Steve

You are probably looking for something like this:

你可能正在寻找这样的东西：

sed -n "s/.*<sometag param='\([^']*\)'>.*//p"

Test:

测试：

echo "<sometag param='x'><irrelevant stuff='nonsense'>" | sed -n "s/.*<sometag param='\([^']*\)'>.*//p"

Results:

结果：

Explanation:

解释：

Instead of a greedy capture, use a non-greedy capture like: [^']*which means match anything except 'any number of times. To make the pattern stick, this is followed by: '>.
You can also use double quotes so that you don't need to escape the single quotes. If you wanted to escape the single quotes, you'd do this:

使用非贪婪捕获代替贪婪捕获，例如：[^']*这意味着匹配除'任意次数之外的任何内容。为了使图案粘住，后面跟着：'>。
您还可以使用双引号，这样就不需要转义单引号。如果你想转义单引号，你可以这样做：

-

——

... | sed -n 's/.*<sometag param='\''\([^'\'']*\)'\''>.*//p'

Notice how that the single quotes aren't really escaped. The sedexpression is stopped, an escaped single quote is inserted and the sedexpression is re-opened. Think of it like a four character escape sequence.

请注意单引号并没有真正转义。该sed表达停止，转义单引号插入和sed表达是重开。把它想象成一个四字符的转义序列。

Personally, I'd use GNU grep. It would make for a slightly shorter solution. Run like:

就个人而言，我会使用GNU grep. 这将是一个稍微短一点的解决方案。运行如下：

... | grep -oP "(?<=<sometag param=').*?(?='>)"

Test:

测试：

echo "<sometag param='x'><irrelevant stuff='nonsense'>" | grep -oP "(?<=<sometag param=').*?(?='>)"

Results:

结果：

Answer 2

回答by aktivb

You don't have to assemble regexes in those cases, you can just use ' as the field separator

在这些情况下，您不必组装正则表达式，您只需使用 ' 作为字段分隔符

in="<sometag param='x'><irrelevant stuff='nonsense'>"

IFS="'" read x whatiwant y <<< "$in"            # bash
echo "$whatiwant"

awk -F\' '{print }' <<< "$in"                 # awk

bash 在bash中的字符串之间获取字符串

提问by Robby75

回答by Steve

回答by aktivb

相关推荐

最近更新

标签

bash 在bash中的字符串之间获取字符串

提问by Robby75

回答by Steve

回答by aktivb

相关推荐

与 cat 相比，Bash while read 循环非常慢，为什么？

bash unix shell 脚本中的分段错误（核心转储）错误。帮忙找bug？

bash 有没有更好的方法从arp表中获取mac地址？

bash 使用 ssh 手动加载私钥

相关推荐

最近更新

标签