bash 如何使用shell脚本加入2个csv文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6301059/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to join 2 csv files with a shell script?
提问by tony Huang
I'm trying to make a shell script that will combine two csv files in the following way:
我正在尝试制作一个 shell 脚本,它将以下列方式组合两个 csv 文件:
I have two csv files, f1.csv and f2.csv. The format of f1.csv is:
我有两个 csv 文件,f1.csv 和 f2.csv。f1.csv 的格式为:
startId, endId, roomNum
f2.csv has a format like this:
f2.csv 的格式如下:
startId, endId, teacherId
I want to combine these two into one csv file with this format:
我想将这两者合并为一个具有以下格式的 csv 文件:
startId, endId, roomNum, teacherId.
What is the best way to accomplish this with a shell script that runs under Linux?
使用在 Linux 下运行的 shell 脚本完成此任务的最佳方法是什么?
回答by dogbane
Try:
尝试:
join -t, -1 1 -2 1 -o 1.2 1.3 1.4 2.4 <(awk -F, '{print ":"","awk -F, '{print ":"","awk -F, '{print ":"","awk -F"," '{print ","","",9999"}' f1.csv > newFile;
awk -F"," '{print ","",9999,"}' f2.csv >> newFile
}' f1.csv | sort
awk -F, '{print ":"","##代码##}' f2.csv | sort
}' f1.csv
awk -F, '{print ":"","##代码##}' f2.csv
}' f1.csv | sort) <(awk -F, '{print ":"","##代码##}' f2.csv | sort)
How it works:
这个怎么运作:
1) I first create a composite key column, by joining the startId and endId into startId:endId for both files.
1) 我首先创建一个复合键列,方法是将两个文件的 startId 和 endId 连接到 startId:endId 中。
##代码##2) I sort both outputs:
2)我对两个输出进行排序:
##代码##3) I then use the joincommand to join on my composite key (in the first column) and output just the columns I need.
3)然后我使用该join命令加入我的复合键(在第一列中)并仅输出我需要的列。
回答by matchew
let me explain whats happening here -F"," specifies a comma as the field-separator.
让我解释一下这里发生了什么 -F"," 指定一个逗号作为字段分隔符。
for the missing column i replaced with the text 9999 you can replace with whatever you like. the firs command is redirecting stdout to a file called 'newFile' and the second command is appending stdout to the same file.
对于我用文本 9999 替换的缺失列,您可以替换为您喜欢的任何内容。firs 命令将标准输出重定向到一个名为“newFile”的文件,第二个命令将标准输出附加到同一个文件。
I hope this helps, your question was not to clear with what you wanted to do with the missing field from each file.
我希望这会有所帮助,您的问题不是要弄清楚您想对每个文件中缺失的字段做什么。
回答by ypnos
Use join -t ';'to combine the corresponding lines. The parameter to the -t option depends on your CSV field separator (typically a semicolon). See the rest on the manpage of join. If you need to trim down duplicate columns later on, use cutfor that.
使用join -t ';'到相应的线条相结合。-t 选项的参数取决于您的 CSV 字段分隔符(通常是分号)。请参阅 join 联机帮助页上的其余部分。如果您稍后需要修剪重复的列,请使用cut它。

