bash Linux 按列合并两个文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25652252/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 11:17:49  来源:igfitidea点击:

Linux Combine two files by column

linuxbashshellawksed

提问by clear.choi

I am trying to combine two files as below (Intersection)

我正在尝试将两个文件组合如下(交集)

ID     Name  Telephone       
1      John     011
2      Sam      013
3      Jena     014
4      Peter    015

Second file Test2.txt

第二个文件 Test2.txt

1       Test1    Test2
2       Test3    Test4
3       Test5    Test6
4       Test7    Test8
5       Test7    Test8
6       Test7    Test8
7       Test7    Test8
8       Test7    Test8
9       Test7    Test8

Then Final result

然后最终结果

ID     Name  Telephone    Remark1  Remark2
1      John    011        Test1    Test2
2      Sam     013        Test3    Test4
3      Jena    014        Test5    Test6
4      Peter   015        Test7    Test8

I did like this as below,

我喜欢这个,如下所示,

awk -F"\t" '
    {key =  }
    NR == 1 {header = key}
    !(key in result) {result[key] = 
$ awk -v OFS='\t' '
NR==1   { print 
$ cat a.txt 
ID     Name  Telephone       
1      John     011
2      Sam      013
3      Jena     014
4      Peter    015
$ cat b.txt 
ID     Remark1  Remark2       
1       Test1    Test2
2       Test3    Test4
3       Test5    Test6
4       Test7    Test8
5       Test7    Test8
6       Test7    Test8
7       Test7    Test8
8       Test7    Test8
9       Test7    Test8
$ join a.txt b.txt 
ID Name Telephone Remark1 Remark2
1 John 011 Test1 Test2
2 Sam 013 Test3 Test4
3 Jena 014 Test5 Test6
4 Peter 015 Test7 Test8
, "Remark1", "Remark2"; next } NR==FNR { a[]=
$ join a.txt b.txt | column -t
ID  Name   Telephone  Remark1  Remark2
1   John   011        Test1    Test2
2   Sam    013        Test3    Test4
3   Jena   014        Test5    Test6
4   Peter  015        Test7    Test8
; next } in a { print a[], , } ' Test1.txt Test2.txt ID Name Telephone Remark1 Remark2 1 John 011 Test1 Test2 2 Sam 013 Test3 Test4 3 Jena 014 Test5 Test6 4 Peter 015 Test7 Test8
; next} { for (i=2; i <= NF; i++) result[key] = result[key] FS $i } END { print result[header] delete result[header] PROCINFO["sorted_in"] = "@ind_str_asc" for (key in result) print result[key] } ' Test1.txt Test2.txt > result.txt

And I just notice that this is Union set. Including all data Test1 and Test2.

我只是注意到这是联盟集。包括所有数据Test1和Test2。

I would like to show only for Intersection case as what I expected result. (1, 2 ,3 ,4) only

我只想将 Intersection 情况显示为我预期的结果。(1, 2 ,3 ,4) 仅

Do you guys have any idea? Thanks!

你们有什么想法吗?谢谢!

采纳答案by Ed Morton

$ pr -tm -w 50 Test1.txt Test2.txt
ID     Name  Telephone   ID Remark1  Remark2
1      John      011     1   Test1    Test2
2      Sam       013     2   Test3    Test4
3      Jena      014     3   Test5    Test6
4      Peter     015     4   Test7    Test8
                         5   Test7    Test8
                         6   Test7    Test8
                         7   Test7    Test8
                         8   Test7    Test8
                         9   Test7    Test8

回答by damienfrancois

It is far easier to use the joincommand:

使用join命令要容易得多 :

 awk -F"\t" '
     {key =  FS  FS  FS }
     NR == 1 {header = key}
     !(key in result) {result[key] = ##代码##; next}
     { for (i=5; i <= NF; i++) result[key] = result[key] FS $i }
     END {
         print result[header]
         delete result[header]
         PROCINFO["sorted_in"] = "@ind_str_asc"    # if using GNU awk
         for (key in result) print result[key]
     }
 ' Test1.txt Test2.txt ... > result.txt

Use the columncommand to pretty print it:

使用column命令来漂亮地打印它:

##代码##

回答by Ed Morton

Another alternative would be prwhich is used for formating files to print.

另一种选择pr是用于格式化要打印的文件。

##代码##

The most important is the mflag which merges files into columns. The tflag removes headers and footers - since we're not going to print on paper, we don't need them. The last wflag is for setting width.

最重要的是m将文件合并为列的标志。该t标志会删除页眉和页脚 - 因为我们不打算在纸上打印,所以我们不需要它们。最后一个w标志用于设置宽度。

回答by Seho Choi

##代码##