bash SED:删除两个字符串之间的文本,跨行重复

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28022041/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 12:12:44  来源:igfitidea点击:

SED: Deleting text between two strings, repeated across the line

bashsed

提问by Hanna

The issue is that I wish to remove all text between to strings on a line using SED. I understand the use of: sed -i 's/str1.*str2//' file.datto remove the text between str1 and str2, inclusive of str1 and str2, but my line has str1 and str2 repeated on the line many times, and I would like to remove the text between each pair. My attempt above removes all text between the first instance of str1 and the last instance of str2. Would appreciate some help in understanding the function to do this.

问题是我希望使用 SED 删除一行中字符串之间的所有文本。我理解的用法是:sed -i 's/str1.*str2//' file.dat去除str1和str2之间的文字,包括str1和str2,但是我的一行有str1和str2在行上重复了很多次,我想去除每一对之间的文字。我上面的尝试删除了 str1 的第一个实例和 str2 的最后一个实例之间的所有文本。希望在理解执行此操作的功能方面有一些帮助。

In addition I would like to repeat this across all lines in the file, and do not know how many times the str1, str2 pair appears on each line. It varies.

另外我想在文件的所有行中重复这个,并且不知道str1,str2对在每一行上出现了多少次。它因人而异。

Kind Regards

亲切的问候

Additional Edit - hope not into a flame-wall!

附加编辑 - 希望不要变成火焰墙!

An example may be of use; Having trouble understanding the answers thus far sorry guys.

一个例子可能有用;到目前为止,无法理解答案,对不起各位。

For a single line in a file example.dat;

对于文件 example.dat 中的一行;

bla.bla.TextOfUnknownLength.bla.bla 1023=3 290=1 336=17 273=07:59:57.833 276=K 278=0 bla.bla.TextOfUnknownLength.bla.bla 1023=20 290=2 336=7 273=07:59:57.833 276=K 278=0 bla.bla.TextOfUnknownLength.bla.bla ...

bla.bla.TextOfUnknownLength.bla.bla 1023=3 290=1 336=17 273=07:59:57.833 276=K 278=0 bla.bla.TextOfUnknownLength.bla.bla 1023=20 290=2 336=7 273=07:59:57.833 276=K 278=0 bla.bla.TextOfUnknownLength.bla.bla ...

I wish to remove from 1023= to 278= inclusive (but not the 0 after 278=) in all instances, this text between 1023= and 278= can occur many times in a line and is of unknown length.

我希望在所有情况下从 1023= 删除到 278=(但不是 278= 之后的 0),1023= 和 278= 之间的文本可以在一行中出现多次并且长度未知。

There are also many lines in the file, and I would like to run this across all lines.

文件中还有很多行,我想在所有行中运行它。

HS

HS

回答by Marc Bredt

sed -ri 's/(foo)(.*)(bar)/\1\3/g' between.file

sed -ri 's/(foo)(.*)(bar)/\1\3/g' between.file

explanation. use regular expressions -rto match the part before,between and after in your line. then just replace with the prefix \1and the suffix \2using sed's internal replacement variables with leading backslashes.

解释。使用正则表达式-r来匹配您行中之前、之间和之后的部分。然后只需使用带有前导反斜杠的 sed 内部替换变量替换前缀\1和后缀\2

UPDATE:Consider between.filecontains the following contents.

更新:考虑between.file包含以下内容。

foo---1---bar
foo---2---bar
foo---3---bar

Then the command above removes the contents between fooand bar, so the output looks like

然后上面的命令删除了foo和之间的内容bar,所以输出看起来像

foobar
foobar
foobar

Wasn't that your desired output/change in your file?

这不是您想要的文件输出/更改吗?

UPDATE:I think awkfits better for your needs.

更新:我认为awk更适合您的需求。

Assume the beween.filecontains the following lines

假设beween.file包含以下几行

A foo---1---bar B foo---10--bar C 
A foo---2---bar D foo---20--bar E 
A foo---3---bar B foo---30---bar C 

this script

这个脚本

#!/bin/bash
awk '{                            
                 all="";
                 for(i=0; i<=NF; i++) { 
                   if(!($i~/foo.*bar/)) { all=all" "$i; } 
                 };                            
                 print all;
               }' between.file

will produce the following output

将产生以下输出

 A B C
 A D E
 A B C

You could use this to implement some kind of DFA to switch into a specific state when reading 1023= and leaving this reading 278=.

您可以使用它来实现某种 DFA,以便在读取 1023= 并离开此读取 278= 时切换到特定状态。

Redirect the output to a new file or search the docuMANtation for awk to process directly on a file. hope this helps.

将输出重定向到新文件或在文档中搜索 awk 以直接处理文件。希望这可以帮助。

回答by NeronLeVelu

just add the gath the end of your sed.

只需g在 sed 的末尾添加ath 即可。

sed -i 's/str1.*str2//g' file.dat 
  • g mean: for each occurence on the current buffer, by default this is the current line.
  • sed work by default 1 line at a time, then at the end of the action, continue with the next one.
  • g mean:对于当前缓冲区上的每次出现,默认情况下这是当前行。
  • sed 默认一次工作 1 行,然后在操作结束时,继续下一行。

Remark with this:

备注:

  • if str1 and str2 are not on the same line, no change between those 2
  • str1 ans str2 are part of the pattern so some special character need to be escaped sometimes (like (,{,[,\,&,^,.,..depending of wanted behaviour.
  • 如果 str1 和 str2 不在同一行上,则这两个之间没有变化
  • str1 和 str2 是模式的一部分,因此有时需要转义某些特殊字符(例如(,{,[,\,&,^,.,..取决于所需的行为。

回答by potong

This might work for you (GNU sed):

这可能对你有用(GNU sed):

sed -r ':a;s/([^\n]*)(foo)[^\n]+(bar)/\n/;ta;s/\n//g' file

Use greed, an unique delimiter and a loop to remove characters between fooand bar. The greed works backwards through the line and the delimiter stops the part of the line that has been processed from being processed again. The loop removes one or more occurances of foothrough bar.

使用贪婪、唯一的分隔符和循环来删除foo和之间的字符bar。贪婪在行中向后工作,分隔符停止再次处理已处理的行部分。该循环会删除一次或多次出现的foothrough bar