bash 如何使用shell脚本加入2个csv文件？

Question

提问by tony Huang

I'm trying to make a shell script that will combine two csv files in the following way:

我正在尝试制作一个 shell 脚本，它将以下列方式组合两个 csv 文件：

I have two csv files, f1.csv and f2.csv. The format of f1.csv is:

我有两个 csv 文件，f1.csv 和 f2.csv。f1.csv 的格式为：

startId, endId, roomNum

f2.csv has a format like this:

f2.csv 的格式如下：

startId, endId, teacherId

I want to combine these two into one csv file with this format:

我想将这两者合并为一个具有以下格式的 csv 文件：

startId, endId, roomNum, teacherId.

What is the best way to accomplish this with a shell script that runs under Linux?

使用在 Linux 下运行的 shell 脚本完成此任务的最佳方法是什么？

Answer 1

回答by dogbane

Try:

尝试：

join -t, -1 1 -2 1 -o 1.2 1.3 1.4 2.4 <(awk -F, '{print ":"","awk -F, '{print ":"","awk -F, '{print ":"","awk -F"," '{print ","","",9999"}' f1.csv > newFile;
awk -F"," '{print ","",9999,"}' f2.csv >> newFile
}' f1.csv | sort 
awk -F, '{print ":"","##代码##}' f2.csv | sort 
}' f1.csv
awk -F, '{print ":"","##代码##}' f2.csv
}' f1.csv | sort) <(awk -F, '{print ":"","##代码##}' f2.csv | sort)

How it works:

这个怎么运作：

1) I first create a composite key column, by joining the startId and endId into startId:endId for both files.

1) 我首先创建一个复合键列，方法是将两个文件的 startId 和 endId 连接到 startId:endId 中。

##代码##

2) I sort both outputs:

2）我对两个输出进行排序：

##代码##

3) I then use the joincommand to join on my composite key (in the first column) and output just the columns I need.

3）然后我使用该join命令加入我的复合键（在第一列中）并仅输出我需要的列。

Answer 2

回答by matchew

##代码##

let me explain whats happening here -F"," specifies a comma as the field-separator.

让我解释一下这里发生了什么 -F"," 指定一个逗号作为字段分隔符。

for the missing column i replaced with the text 9999 you can replace with whatever you like. the firs command is redirecting stdout to a file called 'newFile' and the second command is appending stdout to the same file.

对于我用文本 9999 替换的缺失列，您可以替换为您喜欢的任何内容。firs 命令将标准输出重定向到一个名为“newFile”的文件，第二个命令将标准输出附加到同一个文件。

I hope this helps, your question was not to clear with what you wanted to do with the missing field from each file.

我希望这会有所帮助，您的问题不是要弄清楚您想对每个文件中缺失的字段做什么。

Answer 3

回答by ypnos

Use join -t ';'to combine the corresponding lines. The parameter to the -t option depends on your CSV field separator (typically a semicolon). See the rest on the manpage of join. If you need to trim down duplicate columns later on, use cutfor that.

使用join -t ';'到相应的线条相结合。-t 选项的参数取决于您的 CSV 字段分隔符（通常是分号）。请参阅 join 联机帮助页上的其余部分。如果您稍后需要修剪重复的列，请使用cut它。

bash 如何使用shell脚本加入2个csv文件？

提问by tony Huang

回答by dogbane

回答by matchew

回答by ypnos

相关推荐

最近更新

标签

bash 如何使用shell脚本加入2个csv文件？

提问by tony Huang

回答by dogbane

回答by matchew

回答by ypnos

相关推荐

用于将 IP 地址字符串转换为十六进制格式的 Bash 脚本

bash MagickWand ./configure 找不到 MagickWand-config

从 Vim 中执行 Bash 函数 - 我该怎么做？

bash 从 Cygwin 设置 CLASSPATH

相关推荐

最近更新

标签