如何使用 BASH 比较两个文本文件的相同文本?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12869354/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 03:32:20  来源:igfitidea点击:

How to compare two text files for the same exact text using BASH?

linuxbashtextgrepcompare

提问by user1742682

Let's say I have two text files that I need to extract data out of. The text of the two files is as follows:

假设我有两个文本文件需要从中提取数据。两个文件的正文如下:

File 1:

文件 1:

1name - [email protected]
2Name - [email protected]
3Name - [email protected]
4Name - [email protected]

File 2:

文件2:

email.com
email.com
email.com
anotherwebsite.com

File 2 is File 1's list of domain names, extracted from the email addresses. These are not the same domain names by any means, and are quite random.

文件 2 是文件 1 的域名列表,从电子邮件地址中提取。这些无论如何都不是相同的域名,并且非常随机。

How can I get the results of the domain names that match File 2 from File 1?

如何从文件 1 中获取与文件 2 匹配的域名的结果?

Thank you in advance!

先感谢您!

回答by zwol

Assuming that order does not matter,

假设顺序无关紧要,

grep -F -f FILE2 FILE1

should do the trick. (This works because of a little-known fact: the -Foption to grepdoesn't just mean "match this fixed string," it means "match any of these newline-separated fixed strings.")

应该做的伎俩。(这是因为一个鲜为人知的事实:-F选项grep不仅仅意味着“匹配这个固定字符串”,它还意味着“匹配任何这些以换行符分隔的固定字符串。”)

回答by Serge

The recipe:

食谱:

join <(sed 's/^.*@//' file1|sort -u) <(sort -u file2) 

it will output the intersection of all domain names in file1 and file2

它将输出file1和file2中所有域名的交集

回答by ormaaj

See BashFAQ/036for the list of usual solutions to this type of problem.

有关此类问题的常用解决方案列表,请参阅BashFAQ/036

回答by Srujan Kumar Gulla

Use VimDIFF command, this gives a nice presentation of difference

使用 VimDIFF 命令,这很好地展示了差异

回答by nemo

If I got you right, you want to filter for all addresses with the host mentioned in File 2.

如果我猜对了,您想过滤文件 2 中提到的主机的所有地址。

You could then just loop over File 2and grep for @<line>, accumulating the result in a new file or something similar.

然后你可以循环File 2并 grep for @<line>,将结果累积到一个新文件或类似的东西中。

Example:

例子:

cat file2 | sort -u | while read host; do grep "@$host" file1; done > filtered