Linux 如何使用AWK合并两个文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5467690/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 00:35:15  来源:igfitidea点击:

How to merge two files using AWK?

linuxbashunixawk

提问by Tony

File 1 has 5 fields A B C D E, with field A is an integer-valued

文件 1 有 5 个字段 ABCDE,其中字段 A 是一个整数值

File 2 has 3 fields A F G

文件 2 有 3 个字段 AFG

The number of rows in File 1 is much bigger than that of File 2 (20^6 to 5000)

文件 1 的行数远大于文件 2 的行数(20^6 到 5000)

All the entries of A in File 1 appeared in field A in File 2

文件 1 中 A 的所有条目都出现在文件 2 中的字段 A 中

I like to merge the two files by field A and carry F and G

我喜欢按字段A合并两个文件并携带F和G

Desired output is A B C D E F G

期望的输出是 ABCDEFG

Example

例子

File 1

文件 1

 A     B     C    D    E
4050 S00001 31228 3286 0
4050 S00012 31227 4251 0
4049 S00001 28342 3021 1
4048 S00001 46578 4210 0
4048 S00113 31221 4250 0
4047 S00122 31225 4249 0
4046 S00344 31322 4000 1

File 2

档案 2

A     F    G   
4050 12.1 23.6
4049 14.4 47.8   
4048 23.2 43.9
4047 45.5 21.6

Desired output

期望输出

A    B      C      D   E F    G
4050 S00001 31228 3286 0 12.1 23.6
4050 S00012 31227 4251 0 12.1 23.6
4049 S00001 28342 3021 1 14.4 47.8
4048 S00001 46578 4210 0 23.2 43.9
4048 S00113 31221 4250 0 23.2 43.9
4047 S00122 31225 4249 0 45.5 21.6

采纳答案by kurumi

$ awk 'FNR==NR{a[]= FS ;next}{ print 
BEGIN { while (getline < "File 2") { f[] = ; g[] =  } }
, a[]}' file2 file1 4050 S00001 31228 3286 0 12.1 23.6 4050 S00012 31227 4251 0 12.1 23.6 4049 S00001 28342 3021 1 14.4 47.8 4048 S00001 46578 4210 0 23.2 43.9 4048 S00113 31221 4250 0 23.2 43.9 4047 S00122 31225 4249 0 45.5 21.6 4046 S00344 31322 4000 1

回答by Jonathan Leffler

You need to read the entries from File 2 into a pair of associative arrays in the BEGIN block. Assuming GNU Awk:

您需要将文件 2 中的条目读入 BEGIN 块中的一对关联数组。假设 GNU Awk:

{ print 
awk 'BEGIN { while (getline < "File 2") { f[] = ; g[] =  } }
     print 
join -1 1 -2 1 File1 File2
, f[], g[] }' "File 1"
, f[], g[] }

In the main processing block, you read the line from File 1 and print it with the correct data from the arrays created in the BEGIN block:

在主处理块中,您读取文件 1 中的行,并使用 BEGIN 块中创建的数组中的正确数据打印它:

will-hartungs-computer:tmp will$ cat f1
4050 S00001 31228 3286 0
4050 S00012 31227 4251 0
4049 S00001 28342 3021 1
4048 S00001 46578 4210 0
4048 S00113 31221 4250 0
4047 S00122 31225 4249 0
4046 S00344 31322 4000 1
will-hartungs-computer:tmp will$ cat f2
4050 12.1 23.6
4049 14.4 47.8   
4048 23.2 43.9
4047 45.5 21.6
will-hartungs-computer:tmp will$ join -1 1 -2 1 f1 f2
4050 S00001 31228 3286 0 12.1 23.6
4050 S00012 31227 4251 0 12.1 23.6
4049 S00001 28342 3021 1 14.4 47.8
4048 S00001 46578 4210 0 23.2 43.9
4048 S00113 31221 4250 0 23.2 43.9
4047 S00122 31225 4249 0 45.5 21.6
will-hartungs-computer:tmp will$ 

Supply File 1 as the filename argument to the program.

提供文件 1 作为程序的文件名参数。

awk 'BEGIN{OFS=","}  FNR==NR {F[]=;G[]=;next} {print ,,,,,F[],G[]}' file2.txt file1.txt

The quotes around the file name argument are needed because of the spaces in the file name. You need the quotes around the getlinefilename even if it contained no spaces as it would otherwise be a variable name.

由于文件名中有空格,文件名参数周围需要引号。getline即使文件名不包含空格,您也需要在文件名周围加上引号,否则它将是一个变量名。

回答by Will Hartung

Thankfully, you don't need to write this at all. Unix has a join command to do this for you.

谢天谢地,你根本不需要写这个。Unix 有一个 join 命令来为你做这件事。

##代码##

Here it is "in action":

这是“在行动”:

##代码##

回答by NAGAPPA

##代码##