bash awk 比较两个文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/22100384/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
awk to compare two files
提问by upog
I am trying to compare two files and want to print the matching lines... The lines present in the files will be unique
我正在尝试比较两个文件并想打印匹配的行...文件中存在的行将是唯一的
File1.txt
文件1.txt
GERMANY
FRANCE
UK
POLLAND
File2.txt
文件2.txt
POLLAND
GERMANY
I tried with below command
我尝试使用以下命令
awk 'BEGIN { FS="\n" } ; NR==FNR{A[]++;NEXT}A[]' File1.txt File2.txt
but it is printing the matching record twice, I want them to be printed once...
但它打印了两次匹配的记录,我希望它们打印一次......
UPDATE
更新
expected output
预期产出
POLLAND
GERMANY
Current Output
电流输出
POLLAND
GERMANY
POLLAND
GERMANY
回答by fedorqui 'SO stop harming'
grep
together with -f
(for file) is best for this:
grep
与-f
(for file)一起最适合:
$ grep -f f1 f2
POLLAND
GERMANY
And in fact, to get exact matches and no regex, use respectively -w
and -F
:
事实上,要获得完全匹配且没有正则表达式,请分别使用-w
and -F
:
$ grep -wFf f1 f2
POLLAND
GERMANY
If you really have to do it with awk
, then you can use:
如果您真的必须使用awk
,那么您可以使用:
$ awk 'FNR==NR {a[]; next} in a' f1 f2
POLLAND
GERMANY
FNR==NR
is performed when reading the first file.{a[$1]; next}
stores ina[]
the lines of the first file and goes to the next line.$1 in a
is evaluated when looping through the second file. It checks if the current line is within thea[]
array.
FNR==NR
在读取第一个文件时执行。{a[$1]; next}
存储在a[]
第一个文件的行中并转到下一行。$1 in a
在循环遍历第二个文件时进行评估。它检查当前行是否在a[]
数组内。
Why wasn't your script working?
为什么你的脚本不起作用?
- Because you used
NEXT
instead ofnext
. So it was treated as a constant instead of a command. - Also, because the
BEGIN { FS="\n" }
was wrong, as the defaultFS
is a space and it is ok to be like that. Setting it as a new line was making it misbehave.
- 因为你使用了
NEXT
而不是next
. 所以它被当作一个常量而不是一个命令。 - 另外,因为这
BEGIN { FS="\n" }
是错误的,因为默认值FS
是一个空格,这样就可以了。将其设置为新行会使其行为不端。
回答by Mark Setchell
Your command should maybe be:
您的命令应该是:
awk 'NR==FNR{A[]++;next}A[]' file1 file2
You have a stray semi-colon after the closing brace of BEGIN{} and also have "NEXT" in capital letters and have mis-spelled your filename.
您在 BEGIN{} 的右大括号后面有一个杂散的分号,并且还有大写字母“NEXT”,并且您的文件名拼写错误。
回答by jaypal singh
Try this one-liner:
试试这个单线:
awk 'NR==FNR{name[]++;next} in name' file1.txt file2.txt
- You iterate through first file
NR==FNR
storing the names in an array callednames
. - You use
next
to prevent the second action from happneing until first file is completely stored in array. - Once the first file is complete, you start the next file by checking if it is present in the array. It will print out the name if it exits.
FS
is field separator. You don't need to set that tonew line
. You needRS
which isRecord Separator
to be new line. But we don't do that here because that it the default value.
- 您遍历第一个文件,
NR==FNR
将名称存储在名为names
. - 您
next
用来防止第二个操作发生,直到第一个文件完全存储在数组中。 - 第一个文件完成后,您可以通过检查它是否存在于数组中来启动下一个文件。如果它退出,它将打印出名称。
FS
是字段分隔符。您无需将其设置为new line
. 您需要RS
哪个是Record Separator
新行。但我们在这里不这样做,因为它是默认值。
回答by Jason
If you don't haveto use awk, a better alternative might be the GNU coreutil, comm
. From the man page:
如果你不具备使用AWK,一个更好的选择可能是GNU coreutil, comm
。从手册页:
comm -12 file1 file2 Print only lines present in both file1 and file2.