bash 基于两列连接两个文件

Question

提问by Nick

Believe it or not, I've searched all over the internet and haven't found a working solution for this problem in AWK.

信不信由你，我已经在整个互联网上进行了搜索，但没有在 AWK 中找到解决此问题的有效解决方案。

I have two files, A and B:

我有两个文件，A 和 B：

File A:

文件A：

chr1   pos1   
chr1   pos2
chr2   pos1
chr2   pos2

File B:

文件乙：

chr1 pos1
chr2 pos1
chr3 pos2

Desired Output:

期望输出：

chr1 pos1
chr2 pos1

I'd like to join these two files to basically get the intersection between the two files based on the first AND second columns, not just the first. Since this is the case, most simple scripts won't work and join doesn't seem to be an option.

我想加入这两个文件以基本上根据第一列和第二列获得两个文件之间的交集，而不仅仅是第一列。由于是这种情况，大多数简单的脚本都不起作用，而且 join 似乎不是一种选择。

Any ideas?

有任何想法吗？

EDIT: sorry, I didn't mention that there are more columns than just the two I showed. I've only shown two in my example because I'm only interested in the first two columns between both files being identical, the rest of the data aren't important (but are nonetheless in the file)

编辑：抱歉，我没有提到除了我展示的两列之外还有更多的列。我在我的例子中只展示了两个，因为我只对两个文件之间的前两列感兴趣，其余的数据并不重要（但仍然在文件中）

Answer 1

采纳答案by Aif

Hum, my idea is the following: Use jointo merge the two files and correct with awk

嗯，我的想法是这样的：join用于合并两个文件并用awk修正

$ join  A B 
chr1 pos1 pos1
chr1 pos2 pos1
chr2 pos1 pos1
chr2 pos2 pos1

$ join  A B | awk '{ if (==) printf("%s %s\n", , ) }'
chr1 pos1 pos1
chr2 pos1 pos1

Edit: given the edit, the join solution may still work (with options), so the concept remains correct (imo).

编辑：给定编辑，加入解决方案可能仍然有效（带有选项），因此概念仍然正确（imo）。

Answer 2

回答by glenn Hymanman

The awk solution is:

awk 解决方案是：

awk 'FILENAME==ARGV[1] {pair[ " " ]; next} ( " "  in pair)' fileB fileA

Place the smaller file first since you have to basically hold it in memory.

首先放置较小的文件，因为您基本上必须将它保存在内存中。

Answer 3

回答by Dimitre Radoulov

I would write it like this:

我会这样写：

awk 'NR == FNR {
  k[, ]
  next
  }
(, ) in k
  ' filea fileb

The order of the input files might need to be adapted based on the exact requirement.

可能需要根据具体要求调整输入文件的顺序。

Answer 4

回答by anubhava

Why not simple grep -flike this:

为什么grep -f不像这样简单：

grep -f fileB fileA

EDIT:

编辑：

For files having more than 2 columns try this:

对于超过 2 列的文件，试试这个：

grep "$(cut -d" " -f1,2 fileB)" fileA | cut -d" " -f1,2

bash 基于两列连接两个文件

提问by Nick

采纳答案by Aif

回答by glenn Hymanman

回答by Dimitre Radoulov

回答by anubhava

EDIT:

编辑：

相关推荐

最近更新

标签

bash 基于两列连接两个文件

提问by Nick

采纳答案by Aif

回答by glenn Hymanman

回答by Dimitre Radoulov

回答by anubhava

EDIT:

编辑：

相关推荐

在新的 shell 中运行 bash 命令并在此命令执行后留在新的 shell 中

bash 使用 awk、grep、sed 解析大型日志文件 (~5gb) 的性能问题

bash 用 Unix 文件中的另一个列表替换字符串列表的有效方法是什么？

使用 bash 发送 json HTTP post

相关推荐

最近更新

标签