bash 如何在匹配正则表达式的第一行之后获取文件的一部分?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/7103531/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 20:55:10  来源:igfitidea点击:

How to get the part of a file after the first line that matches a regular expression?

bashshellscriptinggrep

提问by Yugal Jindle

I have a file with about 1000 lines. I want the part of my file after the line which matches my grep statement.

我有一个大约 1000 行的文件。我想要与我的 grep 语句匹配的行之后的文件部分。

That is:

那是:

$ cat file | grep 'TERMINATE'     # It is found on line 534

So, I want the file from line 535 to line 1000 for further processing.

所以,我希望文件从第 535 行到第 1000 行进行进一步处理。

How can I do that?

我怎样才能做到这一点?

回答by jfg956

The following will print the line matching TERMINATEtill the end of the file:

以下将打印匹配TERMINATE到文件末尾的行:

sed -n -e '/TERMINATE/,$p'

Explained:-ndisables default behavior of sedof printing each line after executing its script on it, -eindicated a script to sed, /TERMINATE/,$is an address (line) range selection meaning the first line matching the TERMINATEregular expression (like grep) to the end of the file ($), and pis the print command which prints the current line.

解释:在其上执行其脚本后-n禁用sed打印每一行的默认行为,-e指示脚本到sed/TERMINATE/,$是地址(行)范围选择,意味着匹配TERMINATE正则表达式(如 grep)到文件末尾的第一行($) ,p是打印当前行的打印命令。

This will print from the line that follows the line matching TERMINATEtill the end of the file:
(from AFTER the matching line to EOF, NOT including the matching line)

这将从 匹配行之后的行打印TERMINATE到文件末尾:(
从匹配行之后到 EOF,不包括匹配行)

sed -e '1,/TERMINATE/d'

Explained:1,/TERMINATE/is an address (line) range selection meaning the first line for the input to the 1st line matching the TERMINATEregular expression, and dis the delete command which delete the current line and skip to the next line. As seddefault behavior is to print the lines, it will print the lines after TERMINATEto the end of input.

解释:1,/TERMINATE/是地址(行)范围选择,意思是输入到第一行匹配TERMINATE正则表达式的第一行,d是删除当前行并跳到下一行的删除命令。由于sed默认行为是打印行,它将在TERMINATE输入结束之后打印行。

Edit:

编辑:

If you want the lines before TERMINATE:

如果你想要之前的行TERMINATE

sed -e '/TERMINATE/,$d'

And if you want both lines before and after TERMINATEin 2 different files in a single pass:

如果您希望TERMINATE在一次通过两个不同的文件之前和之后的两行:

sed -e '1,/TERMINATE/w before
/TERMINATE/,$w after' file

The before and after files will contain the line with terminate, so to process each you need to use:

前后文件将包含带有终止的行,因此要处理每个需要使用的行:

head -n -1 before
tail -n +2 after

Edit2:

编辑2:

IF you do not want to hard-code the filenames in the sed script, you can:

如果您不想在 sed 脚本中硬编码文件名,您可以:

before=before.txt
after=after.txt
sed -e "1,/TERMINATE/w $before
/TERMINATE/,$w $after" file

But then you have to escape the $meaning the last line so the shell will not try to expand the $wvariable (note that we now use double quotes around the script instead of single quotes).

但是,您必须转义$最后一行的含义,以便 shell 不会尝试扩展$w变量(请注意,我们现在在脚本周围使用双引号而不是单引号)。

I forgot to tell that the new line is important after the filenames in the script so that sed knows that the filenames end.

我忘了告诉脚本中文件名之后的新行很重要,以便 sed 知道文件名结束。


Edit:2016-0530


编辑:2016-0530

Sébastien Clément asked: "How would you replace the hardcoded TERMINATEby a variable?"

Sébastien Clément 问:“你会如何TERMINATE用变量替换硬编码?”

You would make a variable for the matching text and then do it the same way as the previous example:

您将为匹配的文本创建一个变量,然后按照与上一个示例相同的方式进行操作:

matchtext=TERMINATE
before=before.txt
after=after.txt
sed -e "1,/$matchtext/w $before
/$matchtext/,$w $after" file

to use a variable for the matching text with the previous examples:

将变量用于与前面的示例匹配的文本:

## Print the line containing the matching text, till the end of the file:
## (from the matching line to EOF, including the matching line)
matchtext=TERMINATE
sed -n -e "/$matchtext/,$p"
## Print from the line that follows the line containing the 
## matching text, till the end of the file:
## (from AFTER the matching line to EOF, NOT including the matching line)
matchtext=TERMINATE
sed -e "1,/$matchtext/d"
## Print all the lines before the line containing the matching text:
## (from line-1 to BEFORE the matching line, NOT including the matching line)
matchtext=TERMINATE
sed -e "/$matchtext/,$d"

The important points about replacing text with variables in these cases are:

在这些情况下用变量替换文本的要点是:

  1. Variables ($variablename) enclosed in single quotes['] won't "expand" but variables inside double quotes["] will. So, you have to change all the single quotesto double quotesif they contain text you want to replace with a variable.
  2. The sedranges also contain a $and are immediately followed by a letter like: $p, $d, $w. They will also look like variables to be expanded, so you have to escape those $characters with a backslash [\] like: \$p, \$d, \$w.
  1. $variablename包含在single quotes[ '] 中的变量 ( )不会“扩展”,但double quotes[ "] 中的变量会。因此,如果它们包含要用变量替换的文本single quotesdouble quotes则必须将所有内容更改为。
  2. sed范围也包含$并紧跟像字母:$p$d$w。它们看起来也像要扩展的变量,因此您必须$使用反斜杠 [ \]对这些字符进行转义,例如:\$p, \$d, \$w

回答by aioobe

As a simple approximation you could use

作为一个简单的近似值,您可以使用

grep -A100000 TERMINATE file

which greps for TERMINATEand outputs up to 100000 lines following that line.

其中 grepTERMINATE并在该行之后输出最多 100000 行。

From man page

从手册页

-A NUM, --after-context=NUM

Print NUM lines of trailing context after matching lines.Places a line containing a group separator (--) between contiguous groups of matches. With the -o or --only-matching option, this has no effect and a warning is given.

-A NUM, --after-context=NUM

在匹配行后打印 NUM 行尾随上下文。在连续的匹配组之间放置包含组分隔符 (--) 的行。使用 -o 或 --only-matching 选项,这不起作用并给出警告。

回答by Jos De Graeve

A tool to use here is awk:

这里使用的一个工具是 awk:

cat file | awk 'BEGIN{ found=0} /TERMINATE/{found=1}  {if (found) print }'

How does this work:

这是如何运作的:

  1. We set the variable 'found' to zero, evaluating false
  2. if a match for 'TERMINATE' is found with the regular expression, we set it to one.
  3. If our 'found' variable evaluates to True, print :)
  1. 我们将变量 'found' 设置为零,评估 false
  2. 如果在正则表达式中找到“TERMINATE”的匹配项,我们将其设置为 1。
  3. 如果我们的 'found' 变量的计算结果为 True,则打印 :)

The other solutions might consume a lot of memory if you use them on very large files.

如果您在非常大的文件上使用其他解决方案,它们可能会消耗大量内存。

回答by UlfR

If I understand your question correctly you do want the lines afterTERMINATE, not including the TERMINATE-line. awkcan do this in a simple way:

如果我理解你的问题正确你想行TERMINATE,不包括TERMINATE直插式。awk可以通过一种简单的方式做到这一点:

awk '{if(found) print} /TERMINATE/{found=1}' your_file

Explanation:

解释:

  1. Although not best practice you could rely on the fact that all vars defaults to 0 or the empty string if not defined. So the first expression (if(found) print) will not print anything to start off with.
  2. After the printing is done we check if the this is the starter-line (that should not be included).
  1. 尽管不是最佳实践,但您可以依赖所有变量默认为 0 或未定义的空字符串这一事实。所以第一个表达式 ( if(found) print) 不会打印任何开始的内容。
  2. 打印完成后,我们检查这是否是起始行(不应包括在内)。

This will print all lines afterthe TERMINATE-line.

这将打印-line之后的所有行TERMINATE



Generalization:

概括:

  • You have a file with start- and end-lines and you want the lines between those lines excludingthe start- and end-lines.
  • start- and end-lines could be defined by a regular expression matching the line.
  • 你有一个文件的开始-和结束-lines,你想那些线之间的线不包括开始-和结束-lines。
  • 开始- 和结束 -行可以由匹配该行的正则表达式定义。

Example:

例子:

$ cat ex_file.txt 
not this line
second line
START
A good line to include
And this line
Yep
END
Nope more
...
never ever
$ awk '/END/{found=0} {if(found) print} /START/{found=1}' ex_file.txt 
A good line to include
And this line
Yep
$

Explanation:

解释:

  1. If the end-line is found no printing should be done. Note that this check is done beforethe actual printing to exclude the end-line from the result.
  2. Print the current line if foundis set.
  3. If the start-line is found then set found=1so that the following lines are printed. Note that this check is done afterthe actual printing to exclude the start-line from the result.
  1. 如果找到结束行,则不应进行打印。请注意,此检查是实际打印之前完成的以从结果中排除结束行
  2. 如果found设置,则打印当前行。
  3. 如果找到起始found=1行,则设置为打印以下行。请注意,此检查是实际打印完成的以从结果中排除开始行

Notes:

笔记:

  • The code rely on the fact that all awk-vars defaults to 0 or the empty string if not defined. This is valid but may not be best practice so you could add a BEGIN{found=0}to the start of the awk-expression.
  • If multiple start-end-blocks is found they are all printed.
  • 代码依赖于所有 awk-vars 默认为 0 或空字符串(如果未定义)的事实。这是有效的,但可能不是最佳实践,因此您可以将 a 添加BEGIN{found=0}到 awk 表达式的开头。
  • 如果发现多个start-end-blocks,它们都会被打印出来。

回答by Mu Qiao

Use bash parameter expansion like the following:

使用 bash 参数扩展,如下所示:

content=$(cat file)
echo "${content#*TERMINATE}"

回答by user8910163

grep -A 10000000 'TERMINATE' file

grep -A 10000000 '终止' 文件

  • is much, much faster than sed especially working on really big file. It works up to 10M lines (or whatever you put in) so no harm in making this big enough to handle about anything you hit.
  • 比 sed 快得多,尤其是在处理非常大的文件时。它最多可以工作 1000 万行(或您放入的任何内容),因此将其设置得足够大以处理您遇到的任何事情都没有坏处。

回答by fedorqui 'SO stop harming'

There are many ways to do it with sedor awk:

有很多方法可以使用sedor做到这一点awk

sed -n '/TERMINATE/,$p' file

This looks for TERMINATEin your file and prints from that line up to the end of the file.

这会TERMINATE在您的文件中查找并从该行打印到文件末尾。

awk '/TERMINATE/,0' file

This is exactly the same behaviour as sed.

这与sed.

In case you know the number of the line from which you want to start printing, you can specify it together with NR(number of record, which eventually indicates the number of the line):

如果您知道要开始打印的行号,则可以将其与NR(记录数,最终表示行号)一起指定:

awk 'NR>=535' file

Example

例子

$ seq 10 > a        #generate a file with one number per line, from 1 to 10
$ sed -n '/7/,$p' a
7
8
9
10
$ awk '/7/,0' a
7
8
9
10
$ awk 'NR>=7' a
7
8
9
10

回答by jfg956

If for any reason, you want to avoid using sed, the following will print the line matching TERMINATEtill the end of the file:

如果出于任何原因,您想避免使用 sed,以下内容将打印匹配TERMINATE到文件末尾的行:

tail -n "+$(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)" file

and the following will print from the following line matching TERMINATEtill the end of the file:

以下内容将从匹配的以下行打印TERMINATE到文件末尾:

tail -n "+$(($(grep -n 'TERMINATE' file | head -n 1 | cut -d ":" -f 1)+1))" file

It takes 2 processes to do what sed can do in one process, and if the file changes between the execution of grep and tail, the result can be incoherent, so I recommend using sed. Moreover, if the file dones not contain TERMINATE, the 1st command fails.

sed在一个进程中能做的事情需要2个进程,如果在grep和tail的执行之间文件发生变化,结果可能会不连贯,所以我推荐使用sed。此外,如果文件 done 不包含TERMINATE,则第一个命令将失败。

回答by mivk

Alternatives to the excellent sedanswer by jfgagne, and which don't include the matching line :

sedjfgagne出色答案的替代方案,其中不包括匹配行:

回答by Mariah

This could be a one way of doing it. If you know what line of the file you have your grep word and how many lines you have in your file:

这可能是一种方法。如果您知道文件的哪一行有您的 grep 字以及您的文件中有多少行:

grep -A466 'TERMINATE' file

grep -A466 '终止' 文件