bash Linux 按列合并两个文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/25652252/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Linux Combine two files by column
提问by clear.choi
I am trying to combine two files as below (Intersection)
我正在尝试将两个文件组合如下(交集)
ID Name Telephone
1 John 011
2 Sam 013
3 Jena 014
4 Peter 015
Second file Test2.txt
第二个文件 Test2.txt
1 Test1 Test2
2 Test3 Test4
3 Test5 Test6
4 Test7 Test8
5 Test7 Test8
6 Test7 Test8
7 Test7 Test8
8 Test7 Test8
9 Test7 Test8
Then Final result
然后最终结果
ID Name Telephone Remark1 Remark2
1 John 011 Test1 Test2
2 Sam 013 Test3 Test4
3 Jena 014 Test5 Test6
4 Peter 015 Test7 Test8
I did like this as below,
我喜欢这个,如下所示,
awk -F"\t" '
{key = }
NR == 1 {header = key}
!(key in result) {result[key] = $ awk -v OFS='\t' '
NR==1 { print $ cat a.txt
ID Name Telephone
1 John 011
2 Sam 013
3 Jena 014
4 Peter 015
$ cat b.txt
ID Remark1 Remark2
1 Test1 Test2
2 Test3 Test4
3 Test5 Test6
4 Test7 Test8
5 Test7 Test8
6 Test7 Test8
7 Test7 Test8
8 Test7 Test8
9 Test7 Test8
$ join a.txt b.txt
ID Name Telephone Remark1 Remark2
1 John 011 Test1 Test2
2 Sam 013 Test3 Test4
3 Jena 014 Test5 Test6
4 Peter 015 Test7 Test8
, "Remark1", "Remark2"; next }
NR==FNR { a[]=$ join a.txt b.txt | column -t
ID Name Telephone Remark1 Remark2
1 John 011 Test1 Test2
2 Sam 013 Test3 Test4
3 Jena 014 Test5 Test6
4 Peter 015 Test7 Test8
; next }
in a { print a[], , }
' Test1.txt Test2.txt
ID Name Telephone Remark1 Remark2
1 John 011 Test1 Test2
2 Sam 013 Test3 Test4
3 Jena 014 Test5 Test6
4 Peter 015 Test7 Test8
; next}
{ for (i=2; i <= NF; i++) result[key] = result[key] FS $i }
END {
print result[header]
delete result[header]
PROCINFO["sorted_in"] = "@ind_str_asc"
for (key in result) print result[key]
}
' Test1.txt Test2.txt > result.txt
And I just notice that this is Union set. Including all data Test1 and Test2.
我只是注意到这是联盟集。包括所有数据Test1和Test2。
I would like to show only for Intersection case as what I expected result. (1, 2 ,3 ,4) only
我只想将 Intersection 情况显示为我预期的结果。(1, 2 ,3 ,4) 仅
Do you guys have any idea? Thanks!
你们有什么想法吗?谢谢!
采纳答案by Ed Morton
$ pr -tm -w 50 Test1.txt Test2.txt
ID Name Telephone ID Remark1 Remark2
1 John 011 1 Test1 Test2
2 Sam 013 2 Test3 Test4
3 Jena 014 3 Test5 Test6
4 Peter 015 4 Test7 Test8
5 Test7 Test8
6 Test7 Test8
7 Test7 Test8
8 Test7 Test8
9 Test7 Test8
回答by damienfrancois
It is far easier to use the join
command:
使用join
命令要容易得多 :
awk -F"\t" '
{key = FS FS FS }
NR == 1 {header = key}
!(key in result) {result[key] = ##代码##; next}
{ for (i=5; i <= NF; i++) result[key] = result[key] FS $i }
END {
print result[header]
delete result[header]
PROCINFO["sorted_in"] = "@ind_str_asc" # if using GNU awk
for (key in result) print result[key]
}
' Test1.txt Test2.txt ... > result.txt
Use the column
command to pretty print it:
使用column
命令来漂亮地打印它:
回答by Ed Morton
Another alternative would be pr
which is used for formating files to print.
另一种选择pr
是用于格式化要打印的文件。
The most important is the m
flag which merges files into columns. The t
flag removes headers and footers - since we're not going to print on paper, we don't need them. The last w
flag is for setting width.
最重要的是m
将文件合并为列的标志。该t
标志会删除页眉和页脚 - 因为我们不打算在纸上打印,所以我们不需要它们。最后一个w
标志用于设置宽度。