在 bash (Linux) 中从一个 csv 中的另一个(如 vlookup)中查找值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10697724/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find value from one csv in another one (like vlookup) in bash (Linux)
提问by Yasapl
I have already tried all options that I found online to solve my issue but without good result.
我已经尝试了我在网上找到的所有选项来解决我的问题,但没有很好的结果。
Basically I have two csv files (pipe separated):
基本上我有两个 csv 文件(管道分隔):
file1.csv:
文件1.csv:
123|21|0452|IE|IE|1|MAYOBAN|BRIN|OFFICE|STREET|MAIN STREET|MAYOBAN|
123|21|0452|IE|IE|1|MAYOBAN|BRIN|办公室|街道|主要街道|MAYOBAN|
123|21|0453|IE|IE|1|CORKKIN|ROBERT|SURNAME|CORK|APTS|CORKKIN|
123|21|0453|IE|IE|1|科金|罗伯特|姓氏|科克|APTS|科金|
123|21|0452|IE|IE|1|CORKCOR|NAME|HARRINGTON|DUBLIN|STREET|CORKCOR|
123|21|0452|IE|IE|1|科克科尔|姓名|哈灵顿|都柏林|街道|科克科尔|
file2.csv:
文件2.csv:
MAYOBAN|BANGOR|2400
MAYOBAN|班戈|2400
MAYOBEL|BELLAVARY|2400
MAYOBEL|BELLAVARY|2400
CORKKIN|KINSALE|2200
软木|KINSALE|2200
CORKCOR|CORK|2200
软木塞|软木塞|2200
DUBLD11|DUBLIN 11|2100
都柏林11|都柏林11|2100
I need a linux bash script to find the value of pos.3 from file2 based on the content of pos7 in file1.
我需要一个linux bash脚本根据file1中pos7的内容从file2中找到pos.3的值。
Example: file1, line1, pos 7: MAYOBAN find MAYOBAN in file2, return pos 3 (2400)
示例:file1, line1, pos 7: MAYOBAN 在 file2 中找到 MAYOBAN,返回 pos 3 (2400)
the output should be something like this:
输出应该是这样的:
2400
2400
2200
2200
2200
2200
etc...
等等...
Please help Jacek
请帮助Jacek
回答by sgibb
A little approach, far away to be perfect:
一点办法,远未完美:
DELIMITER="|"
for i in $(cut -f 7 -d "${DELIMITER}" file1.csv );
do
grep "${i}" file2.csv | cut -f 3 -d "${DELIMITER}";
done
回答by Paused until further notice.
This will work, but since the input files must be sorted, the output order will be affected:
这会起作用,但由于必须对输入文件进行排序,因此输出顺序将受到影响:
join -t '|' -1 7 -2 1 -o 2.3 <(sort -t '|' -k7,7 file1.csv) <(sort -t '|' -k1,1 file2.csv)
The output would look like:
输出将如下所示:
2200
2200
2400
which is useless. In order to have a useful output, include the key value:
这是没用的。为了获得有用的输出,请包含键值:
join -t '|' -1 7 -2 1 -o 0,2.3 <(sort -t '|' -k7,7 file1.csv) <(sort -t '|' -k1,1 file2.csv)
The output then looks like this:
输出如下所示:
CORKCOR|2200
CORKKIN|2200
MAYOBAN|2400
Edit:
编辑:
Here's an AWK version:
这是一个 AWK 版本:
awk -F '|' 'FNR == NR {keys[]; next} {if ( in keys) print }' file1.csv file2.csv
This loops through file1.csv and creates array entries for each value of field 7. Simply referring to an array element creates it (with a null value). FNRis the record number in the current file and NRis the record number across all files. When they're equal, the first file is being processed. The nextinstruction reads the next record, creating a loop. When FNR == NRis no longer true, the subsequent file(s) are processed.
这将遍历 file1.csv 并为字段 7 的每个值创建数组条目。只需引用一个数组元素即可创建它(具有空值)。FNR是当前文件NR中的记录号,也是所有文件中的记录号。当它们相等时,正在处理第一个文件。该next指令读取下一条记录,创建一个循环。当FNR == NR不再为真时,处理后续文件。
So file2.csv is now processed and if it has a field 1 that exists in the array, then its field 3 is printed.
所以 file2.csv 现在被处理,如果它有一个存在于数组中的字段 1,那么它的字段 3 被打印出来。
回答by dexnow
cut -d\| -f7 file1.csv|while read line
do
grep $line file1.csv|cut -d\| -f3
done

