bash 使用bash提取文本文件中2个标记之间的行

Question

提问by tapan

i have a text file which looks like this:

我有一个看起来像这样的文本文件：

random useless text 
<!-- this is token 1 --> 
para1 
para2 
para3 
<!-- this is token 2 --> 
random useless text again

I want to extract the text in between the tokens (excluding the tokens of course). I tried using ## and %% to extract the data in between but it didn't work. I think it is not meant for manipulating such large text files. Any suggestions how i can do it ? maybe awk or sed ?

我想提取标记之间的文本（当然不包括标记）。我尝试使用 ## 和 %% 来提取两者之间的数据，但没有用。我认为它不是用来操作这么大的文本文件的。任何建议我怎么做？也许 awk 或 sed ？

Answer 1

回答by Paused until further notice.

No need for headand tailor grepor to read the file multiple times:

无需head和tail或grep或多次读取文件：

sed -n '/<!-- this is token 1 -->/{:a;n;/<!-- this is token 2 -->/b;p;ba}' inputfile

Explanation:

解释：

-n- don't do an implicit print
//{- if the starting marker is found, then
- :a- label "a"
  - n- read the next line
  - //q- if it's the ending marker, quit
  - p- otherwise, print the line
- ba- branch to label "a"
}end if

-n- 不要隐式打印
//{- 如果找到起始标记，则
- :a- 标签“a”
  - n- 阅读下一行
  - //q- 如果是结束标记，则退出
  - p- 否则，打印该行
- ba- 分支到标签“a”
}万一

Answer 2

回答by Peter Taylor

You can extract it, including the tokens with sed. Then use head and tail to strip the tokens off.

您可以提取它，包括带有 sed 的标记。然后使用头部和尾部来剥离令牌。

... | sed -n "/this is token 1/,/this is token 2/p" | head -n-1 | tail -n+2

Answer 3

回答by CaptainChristo

Maybe sed and awk have more elegant solutions, but I have a "poor man's" approach with grep, cut, head, and tail.

也许 sed 和 awk 有更优雅的解决方案，但我对 grep、cut、head 和 tail 有一个“穷人”的方法。

#!/bin/bash

dataFile="/path/to/some/data.txt"
startToken="token 1"
stopToken="token 2"

startTokenLine=$( grep -n "${startToken}" "${dataFile}" | cut -f 1 -d':' )
stopTokenLine=$( grep -n "${stopToken}" "${dataFile}" | cut -f 1 -d':' )

let stopTokenLine=stopTokenLine-1
let tailLines=stopTokenLine-startTokenLine

head -n ${stopTokenLine} ${dataFile} | tail -n ${tailLines}

Answer 4

回答by realex

no need to call mighty sed / awk / perl. You could do it "bash-only":

无需调用强大的 sed/awk/perl。你可以做到“仅限 bash”：

#!/bin/bash
STARTFLAG="false"
while read LINE; do
    if [ "$STARTFLAG" == "true" ]; then
            if [ "$LINE" == '<!-- this is token 2 -->' ];then
                    exit
            else
                    echo "$LINE"
            fi
    elif [ "$LINE" == '<!-- this is token 1 -->' ]; then
            STARTFLAG="true"
            continue
    fi
done < t.txt

Kind regards

亲切的问候

realex

Realex

Answer 5

回答by aioobe

Try the following:

请尝试以下操作：

sed -n '/<!-- this is token 1 -->/,/<!-- this is token 2 -->/p' your_input_file
        | egrep -v '<!-- this is token . -->'

Answer 6

回答by Brian Agnew

For anything like this, I'd reach for Perl, with its combination of (amongst others) sedand awkcapabilities. Something like (beware - untested):

对于这样的事情，我会使用Perl，它结合了（除其他外）sed和awk功能。类似的东西（当心 - 未经测试）：

my $recording = 0;
my @results = ();
while (<STDIN>) {
   chomp;
   if (/token 1/) {
      $recording = 1;
   }
   else if (/token 2/) {
      $recording = 0;
   }
   else if ($recording) {
      push @results, $_;
   }
}

bash 使用bash提取文本文件中2个标记之间的行

提问by tapan

回答by Paused until further notice.

回答by Peter Taylor

回答by CaptainChristo

回答by realex

回答by aioobe

回答by Brian Agnew

相关推荐

最近更新

标签

bash 使用bash提取文本文件中2个标记之间的行

提问by tapan

回答by Paused until further notice.

回答by Peter Taylor

回答by CaptainChristo

回答by realex

回答by aioobe

回答by Brian Agnew

相关推荐

bash 如何创建临时目录？

bash 如何递归遍历目录以删除具有某些扩展名的文件

执行 curl 获取的脚本时将参数传递给 bash

如何在 Bash 中为命令的输出设置变量？

相关推荐

最近更新

标签