bash 使用bash提取文本文件中2个标记之间的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4857424/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 20:09:04  来源:igfitidea点击:

Extract lines between 2 tokens in a text file using bash

bash

提问by tapan

i have a text file which looks like this:

我有一个看起来像这样的文本文件:

random useless text 
<!-- this is token 1 --> 
para1 
para2 
para3 
<!-- this is token 2 --> 
random useless text again

I want to extract the text in between the tokens (excluding the tokens of course). I tried using ## and %% to extract the data in between but it didn't work. I think it is not meant for manipulating such large text files. Any suggestions how i can do it ? maybe awk or sed ?

我想提取标记之间的文本(当然不包括标记)。我尝试使用 ## 和 %% 来提取两者之间的数据,但没有用。我认为它不是用来操作这么大的文本文件的。任何建议我怎么做?也许 awk 或 sed ?

回答by Paused until further notice.

No need for headand tailor grepor to read the file multiple times:

无需headtailgrep或多次读取文件:

sed -n '/<!-- this is token 1 -->/{:a;n;/<!-- this is token 2 -->/b;p;ba}' inputfile

Explanation:

解释:

  • -n- don't do an implicit print
  • /<!-- this is token 1 -->/{- if the starting marker is found, then
    • :a- label "a"
      • n- read the next line
      • /<!-- this is token 2 -->/q- if it's the ending marker, quit
      • p- otherwise, print the line
    • ba- branch to label "a"
  • }end if
  • -n- 不要隐式打印
  • /<!-- this is token 1 -->/{- 如果找到起始标记,则
    • :a- 标签“a”
      • n- 阅读下一行
      • /<!-- this is token 2 -->/q- 如果是结束标记,则退出
      • p- 否则,打印该行
    • ba- 分支到标签“a”
  • }万一

回答by Peter Taylor

You can extract it, including the tokens with sed. Then use head and tail to strip the tokens off.

您可以提取它,包括带有 sed 的标记。然后使用头部和尾部来剥离令牌。

... | sed -n "/this is token 1/,/this is token 2/p" | head -n-1 | tail -n+2

回答by CaptainChristo

Maybe sed and awk have more elegant solutions, but I have a "poor man's" approach with grep, cut, head, and tail.

也许 sed 和 awk 有更优雅的解决方案,但我对 grep、cut、head 和 tail 有一个“穷人”的方法。

#!/bin/bash

dataFile="/path/to/some/data.txt"
startToken="token 1"
stopToken="token 2"

startTokenLine=$( grep -n "${startToken}" "${dataFile}" | cut -f 1 -d':' )
stopTokenLine=$( grep -n "${stopToken}" "${dataFile}" | cut -f 1 -d':' )

let stopTokenLine=stopTokenLine-1
let tailLines=stopTokenLine-startTokenLine

head -n ${stopTokenLine} ${dataFile} | tail -n ${tailLines}

回答by realex

no need to call mighty sed / awk / perl. You could do it "bash-only":

无需调用强大的 sed/awk/perl。你可以做到“仅限 bash”:

#!/bin/bash
STARTFLAG="false"
while read LINE; do
    if [ "$STARTFLAG" == "true" ]; then
            if [ "$LINE" == '<!-- this is token 2 -->' ];then
                    exit
            else
                    echo "$LINE"
            fi
    elif [ "$LINE" == '<!-- this is token 1 -->' ]; then
            STARTFLAG="true"
            continue
    fi
done < t.txt

Kind regards

亲切的问候

realex

Realex

回答by aioobe

Try the following:

请尝试以下操作:

sed -n '/<!-- this is token 1 -->/,/<!-- this is token 2 -->/p' your_input_file
        | egrep -v '<!-- this is token . -->'

回答by Brian Agnew

For anything like this, I'd reach for Perl, with its combination of (amongst others) sedand awkcapabilities. Something like (beware - untested):

对于这样的事情,我会使用Perl,它结合了(除其他外)sedawk功能。类似的东西(当心 - 未经测试):

my $recording = 0;
my @results = ();
while (<STDIN>) {
   chomp;
   if (/token 1/) {
      $recording = 1;
   }
   else if (/token 2/) {
      $recording = 0;
   }
   else if ($recording) {
      push @results, $_;
   }
}