Linux 加入多个文件

Question

提问by prathmesh.kallurkar

I am using the standard join command to join two sorted files based on column1. The command is simple join file1 file2 > output_file.

我正在使用标准 join 命令连接基于 column1 的两个排序文件。该命令是简单的 join file1 file2 > output_file。

But how do I join 3 or more files using the same technique ? join file1 file2 file3 > output_file Above command gave me an empty file. I think sed can help me but I am not too sure how ?

但是如何使用相同的技术连接 3 个或更多文件？join file1 file2 file3 > output_file 上面的命令给了我一个空文件。我认为 sed 可以帮助我，但我不太确定如何？

Answer 1

采纳答案by mata

man join:

man join：

NAME
       join - join lines of two files on a common field

SYNOPSIS
       join [OPTION]... FILE1 FILE2

it only works with twofiles.

它只适用于两个文件。

if you need to join three, maybe you can first join the first two, then join the third.

如果你需要加入三个，也许你可以先加入前两个，然后加入第三个。

try:

尝试：

join file1 file2 | join - file3 > output

that should join the three files without creating an intermediate temp file. -tells the join command to read the first input stream from stdin

应该在不创建中间临时文件的情况下加入三个文件。-告诉 join 命令从中读取第一个输入流stdin

Answer 2

回答by prathmesh.kallurkar

Join joins lines of two fileson a common field. If you want to join more - do it in pairs. Join first two files first, then join the result with a third file etc.

Join在一个公共字段上连接两个文件的行。如果你想加入更多 - 成对进行。先连接前两个文件，然后将结果与第三个文件连接，依此类推。

Answer 3

回答by Gnosophilon

The manpage of joinstates that it only works for two files. So you need to create and intermediate file, which you delete afterwards, i.e.:

该man页面join声明它仅适用于两个文件。因此，您需要创建中间文件，然后将其删除，即：

> join file1 file2 > temp
> join temp file3 > output
> rm temp

Answer 4

回答by ack

One can join multiple files (N>=2) by constructing a pipeline of joins recursively:

可以通过join递归构造s的管道来连接多个文件 (N>=2) ：

#!/bin/sh

# multijoin - join multiple files

join_rec() {
    if [ $# -eq 1 ]; then
        join - ""
    else
        f=; shift
        join - "$f" | join_rec "$@"
    fi
}

if [ $# -le 2 ]; then
    join "$@"
else
    f1=; f2=; shift 2
    join "$f1" "$f2" | join_rec "$@"
fi

Answer 5

回答by rsz

I know this is an old question but for future reference. If you know that the files you want to join have a pattern like in the question here e.g. file1 file2 file3 ... fileNThen you can simply join them with this command

我知道这是一个老问题，但供将来参考。如果您知道要加入的文件具有与此处问题类似的模式，例如，file1 file2 file3 ... fileN那么您可以简单地使用此命令加入它们

cat file* > output

Where output will be the series of the joined files which were joined in alphabetical order.

其中输出将是按字母顺序连接的连接文件系列。

Answer 6

回答by gmargari

I created a function for this. First argument is the output file, rest arguments are the files to be joined.

我为此创建了一个函数。第一个参数是输出文件，其余参数是要加入的文件。

function multijoin() {
    out=
    shift 1
    cat  | awk '{print }' > $out
    for f in $*; do join $out $f > tmp; mv tmp $out; done
}

Usage:

用法：

multijoin output_file file*

Answer 7

回答by kvantour

While a bit an old question, this is how you can do it with a single awk:

虽然有点老问题，但这是您如何使用单个awk：

awk -v j=<field_number> '{key=$j; $j=""}  # get key and delete field j
                         (NR==FNR){order[FNR]=key;} # store the key-order
                         {entry[key]=entry[key] OFS ~$ cat A.txt
x1 2
x2 3
x4 5
x5 8

~$ cat B.txt
x1 5
x2 7
x3 4
x4 6

~$ cat C.txt
x2 1
x3 1
x4 1
x5 1

~$ cat D.txt
x1 1
 } # update key-entry
                         END { for(i=1;i<=FNR;++i) {
                                  key=order[i]; print key entry[key] # print
                               }
                         }' file1 ... filen

This script assumes:

该脚本假设：

all files have the same amount of lines
the order of the output is the same order of the first file.
files do not need to be sorted in field <field_number>
<field_number>is a valid integer.

所有文件的行数相同
输出的顺序与第一个文件的顺序相同。
文件不需要在字段中排序 <field_number>
<field_number>是一个有效的整数。

Answer 8

回答by user3200815

Assuming you have four files A.txt, B.txt, C.txt and D.txt as:

假设您有四个文件 A.txt、B.txt、C.txt 和 D.txt，分别为：

firstOutput='0,1.2'; secondOutput='2.2'; myoutput="$firstOutput,$secondOutput"; outputCount=3; join -a 1 -a 2 -e 0 -o "$myoutput" A.txt B.txt > tmp.tmp; for f in C.txt D.txt; do firstOutput="$firstOutput,1.$outputCount"; myoutput="$firstOutput,$secondOutput"; join -a 1 -a 2 -e 0 -o "$myoutput" tmp.tmp $f > tempf; mv tempf tmp.tmp; outputCount=$(($outputCount+1)); done; mv tmp.tmp files_join.txt

Join the files with:

加入文件：

~$ cat files_join.txt 
x1 2 5 0 1
x2 3 7 1 0
x3 0 4 1 0
x4 5 6 1 0
x5 8 0 1 0

Results:

结果：

##代码##

Linux 加入多个文件

提问by prathmesh.kallurkar

采纳答案by mata

回答by prathmesh.kallurkar

回答by Gnosophilon

回答by ack

回答by rsz

回答by gmargari

回答by kvantour

回答by user3200815

相关推荐

最近更新

标签

Linux 加入多个文件

提问by prathmesh.kallurkar

采纳答案by mata

回答by prathmesh.kallurkar

回答by Gnosophilon

回答by ack

回答by rsz

回答by gmargari

回答by kvantour

回答by user3200815

相关推荐

Linux 在没有 dos2unix 的情况下递归地转换目录和子目录中所有文件的所有 EOL（dos->unix）

Linux size_t 和 off_t 的用法区别是什么？

sgslufread：读取时出现硬错误，操作系统错误 = 104 Linux

Linux 当两个进程试图同时执行一个 perl 文件时，会发生“文本文件繁忙”吗？

相关推荐

最近更新

标签