linux awk 比较两个 csv 文件并创建一个带有标志的新文件

Question

提问by Sanath

I have 2 CSV files that i need to compare and get the difference to a newly formatted file. The samples are given below.

我有 2 个 CSV 文件，我需要比较它们并获取与新格式化文件的差异。下面给出了样品。

OLD file

旧文件

DTL,11111111,1111111111111111,11111111111,Y,N,xx,xx
DTL,22222222,2222222222222222,22222222222,Y,Y,cc,cc
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd
DTL,44444444,4444444444444444,44444444444,Y,Y,ss,ss
DTL,55555555,5555555555555555,55555555555,Y,Y,qq,qq

NEW file

新文件

DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc
DTL,44444444,4444444444444444,44444444444,Y,Y,ss,ss
DTL,55555555,5555555555555555,55555555555,Y,Y,qq,qq
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee

Output file

输出文件

I want to compare the old and new CSV files and to find the changes that has effected in the new file and UPDATE a FLAG to denote these changes

我想比较旧的和新的 CSV 文件并找到影响新文件的更改并更新一个标志来表示这些更改

U - if the new file record is UPDATED D - if a record existing in the old file is deleted in the new file N - if a record existing in the new file is not available in the old file

U - 如果新文件记录被更新 D - 如果旧文件中存在的记录在新文件中被删除 N - 如果新文件中存在的记录在旧文件中不可用

the sample output file is this.

示例输出文件是这样的。

DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx U
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc U
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd D
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee N

I used diff command but it will repeat the UPDATED record too which is not I want.

我使用了 diff 命令，但它也会重复更新记录，这不是我想要的。

 DTL,11111111,1111111111111111,11111111111,Y,N,xx,xx
 DTL,22222222,2222222222222222,22222222222,Y,Y,cc,cc
 DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd
  ---
 DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx
 DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc
 5a5
 DTL,77777777,7777777777777777,77777777777,N,N,ee,ee

I used an AWK single line command to filter out my records as well

我也使用了 AWK 单行命令来过滤掉我的记录

 awk 'NR==FNR{A[];next}!( in A)' FS=: old.csv new.csv

the problem with this is is doesnt get me the records only belonging to the OLD file. which is

问题是没有让我得到只属于旧文件的记录。这是

DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd

I initiated an driven bash script as well to ahieve this but didnt find much help with a good example.

我也启动了一个驱动的 bash 脚本来实现这一点，但没有找到一个很好的例子的帮助。

 myscript.awk

BEGIN { 
        FS = ","    # input field seperator 
        OFS = ","   # output field seperator
}

NR > 1 {
    #flag 
    # N - new record  D- Deleted U - Updated

id = 
    name = 
    flag = 'N'

   # This prints the columns in the new order. The commas tell Awk to use the     character set in OFS
    print id,name,flag
}

 >> awk -f  myscript.awk  old.csv new.csv > formatted.csv

Answer 1

采纳答案by potong

This might work for you:

这可能对你有用：

diff  -W999 --side-by-side OLD NEW |
sed '/^[^\t]*\t\s*|\t\(.*\)/{s// U/;b};/^\([^\t]*\)\t*\s*<$/{s// D/;b};/^.*>\t\(.*\)/{s// N/;b};d'
DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx U
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc U
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd D
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee N

an awk solution along the same lines:

一个 awk 解决方案：

diff -W999 --side-by-side OLD NEW |
awk '/[|][\t]/{split( diff -e OLD NEW
,a,"[|][\t]");print a[2]" U"};/[\t] *<$/{split( 5a
 DTL,77777777,7777777777777777,77777777777,N,N,ee,ee
 .
 1,3c
 DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx
 DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc
,a,"[\t]* *<$");print a[1]" D"};/>[\t]/{split(##代码##,a,">[\t]");print a[2]" N"}'
DTL,11111111,1111111111111111,11111111111,Y,Y,xx,xx U
DTL,22222222,2222222222222222,22222222222,Y,N,cc,cc U
DTL,33333333,3333333333333333,33333333333,Y,Y,dd,dd D
DTL,77777777,7777777777777777,77777777777,N,N,ee,ee N

Answer 2

回答by lnafziger

A good starting point would probably be:

一个好的起点可能是：

##代码##

This outputs:

这输出：

##代码##

Meaning that it Added a record on line 5 (5a) and changed the records on lines 1 and 3 (1,3c).

这意味着它在第 5 行 (5a) 上添加了一条记录并更改了第 1 行和第 3 行 (1,3c) 上的记录。

If you can't use this format as-is (which would be good to use a standard) then you would need to write a script which converts it to the format that you describe.

如果您不能按原样使用这种格式（最好使用标准），那么您需要编写一个脚本将其转换为您描述的格式。

linux awk 比较两个 csv 文件并创建一个带有标志的新文件

提问by Sanath

采纳答案by potong

回答by lnafziger

相关推荐

最近更新

标签

linux awk 比较两个 csv 文件并创建一个带有标志的新文件

提问by Sanath

采纳答案by potong

回答by lnafziger

相关推荐

C# 自动从 XML 模式创建 GUI

Linux 找不到宝石命令

C# 如何使用 VIM 进行 .Net 开发

在 Linux 中，名称与正则表达式匹配的文件的磁盘使用情况？

相关推荐

最近更新

标签