比较linux中的两个未排序列表,在第二个文件中列出唯一的

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11099894/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 06:56:53  来源:igfitidea点击:

Comparing two unsorted lists in linux, listing the unique in the second file

linuxbashshellcomparisongrep

提问by mvrasmussen

I have 2 files with a list of numbers (telephone numbers).

我有 2 个带有数字列表(电话号码)的文件。

I'm looking for a method of listing the numbers in the second file that is not present in the first file.

我正在寻找一种列出第一个文件中不存在的第二个文件中的数字的方法。

I've tried the various methods with:

我已经尝试了各种方法:

comm (getting some weird sorting errors)
fgrep -v -x -f second-file.txt first-file.txt (unsure of the result, there should be more)

采纳答案by Hari Menon

grep -Fxv -f first-file.txt second-file.txt

Basically looks for all lines in second-file.txtwhich don't match any line in first-file.txt. Might be slow if the files are large.

基本上查找second-file.txtfirst-file.txt. 如果文件很大,可能会很慢。

Also, once you sort the files (Use sort -nif they are numeric), then commshould also have worked. What error does it give? Try this:

此外,一旦您对文件进行排序(sort -n如果它们是数字,则使用),那么comm也应该有效。它给出了什么错误?尝试这个:

comm -23 second-file-sorted.txt first-file-sorted.txt

回答by rush

You need to use comm:

您需要使用comm

comm -13 first.txt second.txt

will do the job.

会做的工作。

ps. order of first and second file in command line matters.

附:命令行中第一个和第二个文件的顺序很重要。

also you may need to sort files before:

您也可能需要在之前对文件进行排序:

comm -13 <(sort first.txt) <(sort second.txt)

in case files are numerical add -noption to sort.

如果文件是数字,则将-n选项添加到sort.

回答by Nahuel Fouilleul

This should work

这应该工作

comm -13 <(sort file1) <(sort file2)

It seems sort -n (numeric) cannot work with comm, which uses sort (alphanumeric) internally

似乎 sort -n(数字)不能与 comm 一起使用,它在内部使用 sort(字母数字)

f1.txt

f1.txt

1
2
21
50

f2.txt

f2.txt

1
3
21
50

21 should appear in third column

21 应该出现在第三列

#WRONG
$ comm <(sort -n f1.txt) <(sort -n f2.txt)   
                1
2
21
        3
        21
                50

#OK
$ comm <(sort f1.txt) <(sort f2.txt)
                1
2
                21
        3
                50

回答by tom

cat f1.txt f2.txt | sort |uniq > file3