如何使用 bash 脚本在一个文件中找到行而不在另一个文件中找到行?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6932544/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 00:30:41  来源:igfitidea点击:

How can I find lines in one file but not the other using bash scripting?

bashshell

提问by Senthess

Imagine file 1:

想象一下文件1:

#include "first.h"
#include "second.h"
#include "third.h"

// more code here
...

Imagine file 2:

想象一下文件2:

#include "fifth.h"
#include "second.h"
#include "eigth.h"

// more code here
...

I want to get the headers that are included in file 2, but not in file 1, only those lines. So, when ran, a diff of file 1 and file 2 will produce:

我想获取包含在文件 2 中但不包含在文件 1 中的标题,只有那些行。因此,当运行时,文件 1 和文件 2 的差异将产生:

#include "fifth.h"
#include "eigth.h"

I know how to do it in Perl/Python/Ruby, but I'd like to accomplish this without using a different programming language.

我知道如何在 Perl/Python/Ruby 中做到这一点,但我想在不使用其他编程语言的情况下完成这项工作。

采纳答案by Frank Schmitt

If it's ok to use a temp file, try this:

如果可以使用临时文件,请尝试以下操作:

grep include file1.h > /tmp/x && grep -f /tmp/x -v file2.h | grep include

This

这个

  • extracts all includes from file1.hand writes them to the file /tmp/x
  • uses this file to get all lines from file2.hthat are not contained in this list
  • extracts all includes from the remainder of file2.h
  • file1.h文件中提取所有包含并将它们写入文件/tmp/x
  • 使用此文件获取file2.h未包含在此列表中的所有行
  • 从其余部分中提取所有包含 file2.h

It probably doesn't handle differences in whitespace correctly etc, though.

不过,它可能无法正确处理空格等方面的差异。

EDIT: to prevent false positives, use a different pattern for the last grep (thanks to jw013 for mentioning this):

编辑:为了防止误报,对最后一个 grep 使用不同的模式(感谢 jw013 提到这一点):

grep include file1.h > /tmp/x && grep -f /tmp/x -v file2.h | grep "^#include"

回答by glenn Hymanman

This is a one-liner, but does not preserve the order:

这是一个单行,但不保留顺序:

comm -13 <(grep '#include' file1 | sort) <(grep '#include' file2 | sort)

If you need to preserve the order:

如果您需要保留订单:

awk '
  !/#include/ {next} 
  FILENAME == ARGV[1] {include[]=1; next} 
  !( in include)
' file1 file2

回答by tripleee

This variant requires an fgrepwith the -foption. GNU grep (i.e. any Linux system, and then some) should work fine.

此变体需要fgrep-f选项。GNU grep(即任何 Linux 系统,然后是一些)应该可以正常工作。

# Find occurrences of '#include' in file1.h
fgrep '#include' file1.h |
# Remove any identical lines from file2.h
fgrep -vxf - file2.h |
# Result is all lines not present in file1.h.  Out of those, extract #includes
fgrep '#include'

This does not require any sorting, nor any explicit temporary files. In theory, fgrep -fcould use a temporary file behind the scenes, but I believe GNU fgrepdoesn't.

这不需要任何排序,也不需要任何明确的临时文件。理论上,fgrep -f可以在幕后使用临时文件,但我相信 GNUfgrep不会。

回答by pmocek

If the goal need not be accomplished with Bash alone (i.e., use of external programs is acceptable), then use combinefrom moreutils:

如果不需要单独使用 Bash 来实现目标(即可以使用外部程序),则使用combine来自moreutils

combine file1 not file2 > lines_in_file1_not_in_file2

回答by plbogen

cat $file1 $file2 | grep '#include' | sort | uniq -u

猫 $file1 $file2 | grep '#include' | 排序 | uniq -u