bash awk 比较两个文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22100384/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 00:42:35  来源:igfitidea点击:

awk to compare two files

bashshellawk

提问by upog

I am trying to compare two files and want to print the matching lines... The lines present in the files will be unique

我正在尝试比较两个文件并想打印匹配的行...文件中存在的行将是唯一的

File1.txt

文件1.txt

GERMANY
FRANCE
UK
POLLAND

File2.txt

文件2.txt

POLLAND 
GERMANY

I tried with below command

我尝试使用以下命令

awk 'BEGIN { FS="\n" } ; NR==FNR{A[]++;NEXT}A[]' File1.txt File2.txt

but it is printing the matching record twice, I want them to be printed once...

但它打印了两次匹配的记录,我希望它们打印一次......

UPDATE

更新

expected output

预期产出

POLLAND 
GERMANY

Current Output

电流输出

POLLAND 
GERMANY
POLLAND 
GERMANY

回答by fedorqui 'SO stop harming'

greptogether with -f(for file) is best for this:

grep-f(for file)一起最适合:

$ grep -f f1 f2
POLLAND 
GERMANY

And in fact, to get exact matches and no regex, use respectively -wand -F:

事实上,要获得完全匹配且没有正则表达式,请分别使用-wand -F

$ grep -wFf f1 f2
POLLAND 
GERMANY


If you really have to do it with awk, then you can use:

如果您真的必须使用awk,那么您可以使用:

$ awk 'FNR==NR {a[]; next}  in a' f1 f2
POLLAND 
GERMANY
  • FNR==NRis performed when reading the first file.
  • {a[$1]; next}stores in a[]the lines of the first file and goes to the next line.
  • $1 in ais evaluated when looping through the second file. It checks if the current line is within the a[]array.
  • FNR==NR在读取第一个文件时执行。
  • {a[$1]; next}存储在a[]第一个文件的行中并转到下一行。
  • $1 in a在循环遍历第二个文件时进行评估。它检查当前行是否在a[]数组内。


Why wasn't your script working?

为什么你的脚本不起作用?

  • Because you used NEXTinstead of next. So it was treated as a constant instead of a command.
  • Also, because the BEGIN { FS="\n" }was wrong, as the default FSis a space and it is ok to be like that. Setting it as a new line was making it misbehave.
  • 因为你使用了NEXT而不是next. 所以它被当作一个常量而不是一个命令。
  • 另外,因为这BEGIN { FS="\n" }是错误的,因为默认值FS是一个空格,这样就可以了。将其设置为新行会使其行为不端。

回答by Mark Setchell

Your command should maybe be:

您的命令应该是:

awk 'NR==FNR{A[]++;next}A[]' file1 file2

You have a stray semi-colon after the closing brace of BEGIN{} and also have "NEXT" in capital letters and have mis-spelled your filename.

您在 BEGIN{} 的右大括号后面有一个杂散的分号,并且还有大写字母“NEXT”,并且您的文件名拼写错误。

回答by jaypal singh

Try this one-liner:

试试这个单线:

awk 'NR==FNR{name[]++;next} in name' file1.txt file2.txt
  • You iterate through first file NR==FNRstoring the names in an array called names.
  • You use nextto prevent the second action from happneing until first file is completely stored in array.
  • Once the first file is complete, you start the next file by checking if it is present in the array. It will print out the name if it exits.
  • FSis field separator. You don't need to set that to new line. You need RSwhich is Record Separatorto be new line. But we don't do that here because that it the default value.
  • 您遍历第一个文件,NR==FNR将名称存储在名为names.
  • next用来防止第二个操作发生,直到第一个文件完全存储在数组中。
  • 第一个文件完成后,您可以通过检查它是否存在于数组中来启动下一个文件。如果它退出,它将打印出名称。
  • FS是字段分隔符。您无需将其设置为new line. 您需要RS哪个是Record Separator新行。但我们在这里不这样做,因为它是默认值。

回答by Jason

If you don't haveto use awk, a better alternative might be the GNU coreutil, comm. From the man page:

如果你不具备使用AWK,一个更好的选择可能是GNU coreutil, comm。从手册页:

comm -12 file1 file2 Print only lines present in both file1 and file2.