bash 替换多个模式,但不要使用相同的字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29606527/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 12:47:31  来源:igfitidea点击:

Replace multiple patterns, but not with the same string

bashsed

提问by ornit

is it possible to change multiply patterns to different values at the same command? lets say I have

是否可以在同一命令中将乘法模式更改为不同的值?可以说我有

A B C D ABC

and I want to change every A to 1 every B to 2 and every C to 3

我想把每一个 A 改为 1,每一个 B 改为 2,每一个 C 改为 3

so the output will be

所以输出将是

1 2 3 D 123

since I have 3 patterns to change I would like to avoid substitute them separately. I thought there would be something like

因为我有 3 种模式要改变,所以我想避免单独替换它们。我以为会有类似的东西

sed -r s/'(A|B|C)'/(1|2|3)/ 

but of course this just replace A or B or C to (1|2|3). I should just mention that my real patterns are more complicated than that...

但当然这只是将 A 或 B 或 C 替换为 (1|2|3)。我应该提到我的真实模式比那更复杂......

thank you!

谢谢你!

采纳答案by choroba

Easy in Perl:

在 Perl 中很容易:

perl -pe '%h = (A => 1, B => 2, C => 3); s/(A|B|C)/$h{}/g'

If you use more complex patterns, put the more specific ones before the more general ones in the alternative list. Sorting by length might be enough:

如果您使用更复杂的模式,请将更具体的模式放在替代列表中更通用的模式之前。按长度排序可能就足够了:

perl -pe 'BEGIN { %h = (A => 1, AA => 2, AAA => 3);
              $re = join "|", sort { length $b <=> length $a } keys %h; }
          s/($re)/$h{}/g'

To add word or line boundaries, just change the pattern to

要添加字或行边界,只需将模式更改为

/\b($re)\b/
# or
/^($re)$/
# resp.

回答by hek2mgl

Easy in sed:

轻松进入sed

sed 's/WORD1/NEW_WORD1/g;s/WORD2/NEW_WORD2/g;s/WORD3/NEW_WORD3/g'

You can separate multiple commands on the same line by a ;

您可以在同一行上用一个分隔多个命令 ;



Update

更新

Probably this was too easy. NeronLeVelupointed out that the above command can lead to unwanted results because the second substitution might even touch results of the first substitution (and so on).

可能这太容易了。NeronLeVelu指出,上述命令可能会导致不需要的结果,因为第二次替换甚至可能触及第一次替换的结果(以此类推)。

If you care about this you can avoid this side effect with the tcommand. The tcommand branches to the end of the script, but only if a substitution did happen:

如果您关心这一点,您可以使用t命令避免这种副作用。该t命令分支到脚本的末尾,但前提是替换没有发生:

sed 's/WORD1/NEW_WORD1/g;t;s/WORD2/NEW_WORD2/g;t;s/WORD3/NEW_WORD3/g'  

回答by Ed Morton

This will work if your "words" don't contain RE metachars (. * ? etc.):

如果您的“单词”不包含 RE 元字符(. * ? 等),这将起作用:

$ cat file
there is the problem when the foo is closed

$ cat tst.awk
BEGIN {
    split("the a foo bar",tmp)
    for (i=1;i in tmp;i+=2) {
        old = (i>1 ? old "|" : "\<(") tmp[i]
        map[tmp[i]] = tmp[i+1]
    }
    old = old ")\>"
}
{
    head = ""
    tail = ##代码##
    while ( match(tail,old) ) {
        head = head substr(tail,1,RSTART-1) map[substr(tail,RSTART,RLENGTH)]
        tail = substr(tail,RSTART+RLENGTH)
    }
    print head tail
}

$ awk -f tst.awk file
there is a problem when a bar is closed

The above obviously maps "the" to "a" and "foo" to "bar" and uses GNU awk for word boundaries.

上面显然将“the”映射到“a”,将“foo”映射到“bar”,并使用 GNU awk 作为单词边界。

If your "words" do contain RE metachars etc. then you need a string-based solution using index()instead of an RE based one using match()(note that sedONLY supports REs, not strings).

如果您的“单词”确实包含 RE 元字符等,那么您需要一个基于字符串的解决方案 usingindex()而不是基于 RE 的解决方案using match()(请注意,sed仅支持 RE,而不是字符串)。