bash 匹配两个文件的第一列中的值并将匹配的行加入新文件中

Question

提问by user1911823

I need to find matches with the string in column 1 ($1) in file1.txt with the string in column 1 ($1) in file2.txt. Then I want to join the lines where there was a match in a new file.

我需要找到与file1.txt 中第1 列($1) 中的字符串与file2.txt 中第1 列($1) 中的字符串的匹配项。然后我想加入新文件中匹配的行。

cat file1.txt
1050008 5.156725968 8.404038296 124.9198605 3.23E-21    2.33E-17    38.57865782
3310747 5.631470026 8.581936875 124.6039122 3.34E-21    2.33E-17    38.55204806
5910451 4.900364671 8.455329195 124.5720603 3.35E-21    2.33E-17    38.54935989
730156  5.565210738 8.48792701  122.2168789 4.28E-21    2.33E-17    38.34773989

cat file2.txt
4230037 ILMN Controls   ILMN_Controls   ERCC-00071  ILMN_333646 ERCC-00071  ERCC-00071
1050008 ILMN Controls   ILMN_Controls   ERCC-00009  ILMN_333584 ERCC-00009  ERCC-00009
5260356 ILMN Controls   ILMN_Controls   ERCC-00053  ILMN_333628 ERCC-00053  ERCC-00053
3310747 ILMN Controls   ILMN_Controls   ERCC-00144  ILMN_333719 ERCC-00144  ERCC-00144
5910451 ILMN Controls   ILMN_Controls   ERCC-00003  ILMN_333578 ERCC-00003  ERCC-00003
1710435 ILMN Controls   ILMN_Controls   ERCC-00138  ILMN_333713 ERCC-00138  ERCC-00138
1400612 ILMN Controls   ILMN_Controls   ERCC-00084  ILMN_333659 ERCC-00084  ERCC-00084
730156  ILMN Controls   ILMN_Controls   ERCC-00017  ILMN_333592 ERCC-00017  ERCC-00017

I would like the output file to look like this:

我希望输出文件如下所示：

out.txt
1050008 5.156725968 8.404038296 124.9198605 3.23E-21    2.33E-17    38.57865782 1050008 ILMN Controls   ILMN_Controls   ERCC-00009  ILMN_333584 ERCC-00009  ERCC-00009
3310747 5.631470026 8.581936875 124.6039122 3.34E-21    2.33E-17    38.55204806 3310747 ILMN Controls   ILMN_Controls   ERCC-00144  ILMN_333719 ERCC-00144  ERCC-00144
5910451 4.900364671 8.455329195 124.5720603 3.35E-21    2.33E-17    38.54935989 5910451 ILMN Controls   ILMN_Controls   ERCC-00003  ILMN_333578 ERCC-00003  ERCC-00003
730156  5.565210738 8.48792701  122.2168789 4.28E-21    2.33E-17    38.34773989 730156  ILMN Controls   ILMN_Controls   ERCC-00017  ILMN_333592 ERCC-00017  ERCC-00017

The files are tab delimited and have missing values in some columns.

这些文件以制表符分隔，并且在某些列中缺少值。

There is 31 columns in file2.txt and >47000 lines and I'm trying to do this in bash (OSX)

file2.txt 中有 31 列和 >47000 行，我正在尝试在 bash (OSX) 中执行此操作

If you have a solution I would greatly appreciate if you could briefly explainn the steps as I'm very new to this.

如果您有解决方案，我将不胜感激，如果您能简要解释这些步骤，我将不胜感激，因为我对此很陌生。

Answer 1

回答by Dimitre Radoulov

awk 'BEGIN {
  FS = OFS = "\t"
  }
NR == FNR {
  # while reading the 1st file
  # store its records in the array f
  f[] = join <(sort file1.txt) <(sort file2.txt) >out.txt

  next
  }
 in f {
  # when match is found
  # print all values
  print f[], ##代码##
  }' file1 file2

Answer 2

回答by cmh

If you don't mind the output being ordered by the first column then you can use this invocation of the joincommand:

如果您不介意第一列对输出进行排序，那么您可以使用此join命令调用：

##代码##

bash 匹配两个文件的第一列中的值并将匹配的行加入新文件中

提问by user1911823

回答by Dimitre Radoulov

回答by cmh

相关推荐

最近更新

标签

bash 匹配两个文件的第一列中的值并将匹配的行加入新文件中

提问by user1911823

回答by Dimitre Radoulov

回答by cmh

相关推荐

BASH 脚本期待然后，当我需要其他

bash Linux Yum 致命 Python 错误：pycurl：libcurl 链接时版本比编译时版本旧

bash 带有 s3fs 和保险丝的 Amazon S3。卸载和挂载脚本

bash PHP脚本exec bash脚本不打印所有bash行

相关推荐

最近更新

标签