bash 用键值连接两个 csv 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/25875368/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 11:22:14  来源:igfitidea点击:

join two csv files with key value

bashcsvawk

提问by Enric Agud Pique

I have two csv files, I want to join them using a key value, the column of the city.

我有两个 csv 文件,我想使用一个键值(城市的列)加入它们。

One csv file, d01.csv has this form,

一个 csv 文件,d01.csv 有这种形式,

Barcelona, 19.5, 29.5
Tarragona, 20.4, 31.5 
Girona, 17.2, 32.5
Lleida, 16.5, 33.5 
Vic, 17.5, 31.4

The other one, d02.csv, has the next structure,

另一个,d02.csv,具有下一个结构,

City, Data, TMax, TMin
Barcelona, 20140916, 19.9, 28.5
Tarragona, 20140916, 21.4, 30.5  
Lleida, 20140916, 17.5, 32.5 
Tortosa, 20140916, 20.5, 30.4

I need a new csv file, with a column of cities which appear in the 2 csv files.

我需要一个新的 csv 文件,其中有一列城市出现在 2 个 csv 文件中。

City, Tmin, Tmax, Date, Tmin1, Tmax1
Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Tarragona, 20.4, 31.5, 20140916, 21.4, 30.5
Girona, 17.2, 32.5, 20140916, 17.5, 32.5
Lleida, 16.5, 33.5, 20140916, 20.5, 30.4

I tried to do that with

我试着这样做

join -j 2 -t ',' d01.csv d02.csv | awk -F "," '{print , , , , } > d03.csv

but it is not complete...how can I order the key value?

但它不完整......我如何订购键值?

采纳答案by glenn Hymanman

Here's how to use join in bash:

以下是在 bash 中使用 join 的方法:

{
  echo "City, Tmin, Tmax, Date, Tmin1, Tmax1"
  join -t, <(sort d01.csv) <(sed 1d d02.csv | sort)
} > d03.csv
cat d03.csv
City, Tmin, Tmax, Date, Tmin1, Tmax1
Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5 
Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5  

Note that join only outputs records where the key exists in both files. To get all of them, specify that you want missing records from both files, specify the fields you want, and give a default value for the missing fields:

请注意,join 仅输出两个文件中都存在键的记录。要获得所有这些,请指定您希望从两个文件中丢失记录,指定所需的字段,并为丢失的字段提供默认值:

join -t, -a1 -a2 -o 0,1.2,1.3,2.2,2.3,2.4 -e '?' <(sort d01.csv) <(sed 1d d02.csv | sort)
Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Girona, 17.2, 32.5,?,?,?
Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5 
Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5  
Tortosa,?,?, 20140916, 20.5, 30.4
Vic, 17.5, 31.4,?,?,?

回答by Ondra ?i?ka

I suggest the CSV Cruncherwhich takes CSV files as SQL tables and then allows SQL queries, resulting in another CSV file.

我建议使用CSV Cruncher,它将 CSV 文件作为 SQL 表,然后允许 SQL 查询,从而生成另一个 CSV 文件。

Example:

例子:

crunch input.csv output.csv \
   "SELECT AVG(duration) AS durAvg FROM (SELECT * FROM indata ORDER BY duration LIMIT 2 OFFSET 6)"

The tool needs Java 5 or later.

该工具需要 Java 5 或更高版本。

Some of the advantages:

一些优点:

  • You really get CSV support, not just "let's assume the data is correct".
  • You can join on multiple keys.
  • Easier to use and understand than join-based solutions.
  • You can combine more than 2 CSV files.
  • You can join by SQL expressions - the values don't have to be the same.
  • 您确实获得了 CSV 支持,而不仅仅是“让我们假设数据是正确的”。
  • 您可以加入多个键。
  • join基于解决方案更易于使用和理解。
  • 您可以合并 2 个以上的 CSV 文件。
  • 您可以通过 SQL 表达式加入 - 值不必相同。

Disclaimer: I wrote that tool. Unknown project state - Google Code was closed and I didn't transfer it soon enough. I might have a look at it if someone is insterested.

免责声明:我写了那个工具。未知的项目状态 - Google 代码已关闭,我没有尽快转移它。如果有人感兴趣,我可能会看一下。

回答by Jotne

This awkmay do:

awk可能会:

awk 'FNR==NR {a[]=FSFS;next}  in a {print ##代码##,a[]}' OFS=", " d02,csv d01csv
Barcelona, 19.5, 29.5, 20140916, 19.9, 28.5
Tarragona, 20.4, 31.5 , 20140916, 21.4, 30.5
Lleida, 16.5, 33.5 , 20140916, 17.5, 32.5