bash 如何使用shell脚本加入2个csv文件?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6301059/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 00:09:49  来源:igfitidea点击:

How to join 2 csv files with a shell script?

linuxbashscripting

提问by tony Huang

I'm trying to make a shell script that will combine two csv files in the following way:

我正在尝试制作一个 shell 脚本,它将以下列方式组合两个 csv 文件:

I have two csv files, f1.csv and f2.csv. The format of f1.csv is:

我有两个 csv 文件,f1.csv 和 f2.csv。f1.csv 的格式为:

startId, endId, roomNum

f2.csv has a format like this:

f2.csv 的格式如下:

startId, endId, teacherId 

I want to combine these two into one csv file with this format:

我想将这两者合并为一个具有以下格式的 csv 文件:

startId, endId, roomNum, teacherId. 

What is the best way to accomplish this with a shell script that runs under Linux?

使用在 Linux 下运行的 shell 脚本完成此任务的最佳方法是什么?

回答by dogbane

Try:

尝试:

join -t, -1 1 -2 1 -o 1.2 1.3 1.4 2.4 <(awk -F, '{print ":"","
awk -F, '{print ":"","
awk -F, '{print ":"","
awk -F"," '{print ","","",9999"}' f1.csv > newFile;
awk -F"," '{print ","",9999,"}' f2.csv >> newFile
}' f1.csv | sort awk -F, '{print ":"","##代码##}' f2.csv | sort
}' f1.csv awk -F, '{print ":"","##代码##}' f2.csv
}' f1.csv | sort) <(awk -F, '{print ":"","##代码##}' f2.csv | sort)

How it works:

这个怎么运作:

1) I first create a composite key column, by joining the startId and endId into startId:endId for both files.

1) 我首先创建一个复合键列,方法是将两个文件的 startId 和 endId 连接到 startId:endId 中。

##代码##

2) I sort both outputs:

2)我对两个输出进行排序:

##代码##

3) I then use the joincommand to join on my composite key (in the first column) and output just the columns I need.

3)然后我使用该join命令加入我的复合键(在第一列中)并仅输出我需要的列。

回答by matchew

##代码##

let me explain whats happening here -F"," specifies a comma as the field-separator.

让我解释一下这里发生了什么 -F"," 指定一个逗号作为字段分隔符。

for the missing column i replaced with the text 9999 you can replace with whatever you like. the firs command is redirecting stdout to a file called 'newFile' and the second command is appending stdout to the same file.

对于我用文本 9999 替换的缺失列,您可以替换为您喜欢的任何内容。firs 命令将标准输出重定向到一个名为“newFile”的文件,第二个命令将标准输出附加到同一个文件。

I hope this helps, your question was not to clear with what you wanted to do with the missing field from each file.

我希望这会有所帮助,您的问题不是要弄清楚您想对每个文件中缺失的字段做什么。

回答by ypnos

Use join -t ';'to combine the corresponding lines. The parameter to the -t option depends on your CSV field separator (typically a semicolon). See the rest on the manpage of join. If you need to trim down duplicate columns later on, use cutfor that.

使用join -t ';'到相应的线条相结合。-t 选项的参数取决于您的 CSV 字段分隔符(通常是分号)。请参阅 join 联机帮助页上的其余部分。如果您稍后需要修剪重复的列,请使用cut它。