string 如何使用 sed/grep 提取两个单词之间的文本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13242469/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 01:40:40  来源:igfitidea点击:

How to use sed/grep to extract text between two words?

stringbashsedgrep

提问by user1190650

I am trying to output a string that contains everything between two words of a string:

我试图输出一个字符串,其中包含字符串的两个单词之间的所有内容:

input:

输入:

"Here is a String"

output:

输出:

"is a"

Using:

使用:

sed -n '/Here/,/String/p'

includes the endpoints, but I don't want to include them.

包括端点,但我不想包括它们。

采纳答案by Brian Campbell

sed -e 's/Here\(.*\)String//'

回答by anishsane

GNU grep can also support positive & negative look-ahead & look-back: For your case, the command would be:

GNU grep 还可以支持正面和负面的前瞻和回顾:对于您的情况,命令是:

echo "Here is a string" | grep -o -P '(?<=Here).*(?=string)'

If there are multiple occurrences of Hereand string, you can choose whether you want to match from the first Hereand last stringor match them individually. In terms of regex, it is called as greedy match (first case)or non-greedy match (second case)

如果有多次出现Here并且string,你可以选择你是否想从第一场比赛Here和最后的string或单独匹配。在正则表达式的方面,它被称为贪婪匹配(第一情况)非贪婪匹配(第二种情况)

$ echo 'Here is a string, and Here is another string.' | grep -oP '(?<=Here).*(?=string)' # Greedy match
 is a string, and Here is another 
$ echo 'Here is a string, and Here is another string.' | grep -oP '(?<=Here).*?(?=string)' # Non-greedy match (Notice the '?' after '*' in .*)
 is a 
 is another 

回答by wheeler

The accepted answer does not remove text that could be before Hereor after String. This will:

接受的答案不会删除可能在 之前Here或之后的文本String。这会:

sed -e 's/.*Here\(.*\)String.*//'

The main difference is the addition of .*immediately before Hereand after String.

主要区别是在.*之前Here和之后添加了String

回答by ghoti

You can strip strings in Bashalone:

您可以单独在Bash 中剥离字符串:

$ foo="Here is a String"
$ foo=${foo##*Here }
$ echo "$foo"
is a String
$ foo=${foo%% String*}
$ echo "$foo"
is a
$

And if you have a GNU grep that includes PCRE, you can use a zero-width assertion:

如果您有一个包含PCRE的 GNU grep ,您可以使用零宽度断言:

$ echo "Here is a String" | grep -Po '(?<=(Here )).*(?= String)'
is a

回答by Avinash Raj

Through GNU awk,

通过 GNU awk,

$ echo "Here is a string" | awk -v FS="(Here|string)" '{print }'
 is a 

grep with -P(perl-regexp) parameter supports \K, which helps in discarding the previously matched characters. In our case , the previously matched string was Hereso it got discarded from the final output.

grep with -P( perl-regexp) 参数支持\K,这有助于丢弃以前匹配的字符。在我们的例子中,先前匹配的字符串Here因此从最终输出中被丢弃。

$ echo "Here is a string" | grep -oP 'Here\K.*(?=string)'
 is a 
$ echo "Here is a string" | grep -oP 'Here\K(?:(?!string).)*'
 is a 

If you want the output to be is athen you could try the below,

如果你想要输出,is a那么你可以尝试下面的,

$ echo "Here is a string" | grep -oP 'Here\s*\K.*(?=\s+string)'
is a
$ echo "Here is a string" | grep -oP 'Here\s*\K(?:(?!\s+string).)*'
is a

回答by alemol

If you have a long file with many multi-line ocurrences, it is useful to first print number lines:

如果你有一个多行出现的长文件,首先打印数字行会很有用:

cat -n file | sed -n '/Here/,/String/p'

回答by potong

This might work for you (GNU sed):

这可能对你有用(GNU sed):

sed '/Here/!d;s//&\n/;s/.*\n//;:a;/String/bb;$!{n;ba};:b;s//\n&/;P;D' file 

This presents each representation of text between two markers (in this instance Hereand String) on a newline and preserves newlines within the text.

这将在换行符上的两个标记(在本例中为HereString)之间呈现文本的每个表示,并在文本中保留换行符。

回答by Gary Dean

All the above solutions have deficiencies where the last search string is repeated elsewhere in the string. I found it best to write a bash function.

上述所有解决方案都有不足之处,即最后一个搜索字符串在字符串的其他地方重复。我发现最好编写一个 bash 函数。

    function str_str {
      local str
      str="${1#*}"
      str="${str%%*}"
      echo -n "$str"
    }

    # test it ...
    mystr="this is a string"
    str_str "$mystr" "this " " string"

回答by mvairavan

You can use \1(refer to http://www.grymtheitroade.com/Unix/Sed.html#uh-4):

您可以使用\1(请参阅http://www.grymtheitroade.com/Unix/Sed.html#uh-4):

echo "Hello is a String" | sed 's/Hello\(.*\)String//g'

The contents that is inside the brackets will be stored as \1.

括号内的内容将存储为\1.

回答by Sabrina

To understand sedcommand, we have to build it step by step.

要理解sed命令,我们必须一步一步地构建它。

Here is your original text

这是你的原文

user@linux:~$ echo "Here is a String"
Here is a String
user@linux:~$ 

Let's try to remove Herewith substition option in sed

让我们尝试Here使用substition 选项删除sed

user@linux:~$ echo "Here is a String" | sed 's/Here //'
is a String
user@linux:~$ 

At this point, I believe you would be able to remove Stringas well

在这一点上,我相信你将能够去除String以及

user@linux:~$ echo "Here is a String" | sed 's/String//'
Here is a
user@linux:~$ 

But this is not your desired output.

但这不是您想要的输出。

To combine two sed commands, use -eoption

要组合两个 sed 命令,请使用-e选项

user@linux:~$ echo "Here is a String" | sed -e 's/Here //' -e 's/String//'
is a
user@linux:~$ 

Hope this helps

希望这可以帮助