bash 用于验证 csv 字段的 Shell 脚本

Question

提问by odew

I have a csv file with 20 fields. I want to have a script to check if the file is valid according with the following points:

我有一个包含 20 个字段的 csv 文件。我想要一个脚本来根据以下几点检查文件是否有效：

It needs to have 20 fields separated by pipes.
Each of the 20 fields should match a regex.
Know the line and field number for any regex unmatch.

它需要有 20 个由管道分隔的字段。
20 个字段中的每一个都应匹配一个正则表达式。
知道任何正则表达式不匹配的行号和字段号。

ex:

前任：

f1|f2|f3|...|f20
1|aaaa|Y|...|2014/06/25
2|bb|Y...|2014/06/25
3|ccc|N...|2014/06/25

regex:
f1 [0-9]
f2 [a-z]{2,4}
f3 [YN]
.
.
.
f20 [1-9][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]

What are the best shell tools to do it? Do you have any similar script?

什么是最好的 shell 工具？有没有类似的剧本？

Answer 1

回答by anubhava

Best tool in Unix systems is awkfor this job. You can use an awk command like this:

Unix 系统中最好的工具就是awk做这个工作的。您可以像这样使用 awk 命令：

awk 'BEGIN{FS=OFS="|"} NF!=20{print "not enough fields"; exit}
!(~/^[0-9]$/) {print "1st field invalid"; exit}' file.csv

Answer 2

回答by Tom Fenech

You might want to consider using a perl script for this:

您可能需要考虑为此使用 perl 脚本：

#!/usr/bin/env perl

use strict;
use warnings;

my @regexes = (
    qr/\d/,                  # regex quotes qr/ /
    qr/[a-z]{2,4}/, 
    qr/[YN]/,
    #etc. put the rest of the regexes here
);

while (<>) {                 # loop through every line of file
    my @fields = split /\|/; # split on pipe, needs escaping
    if (@fields != 20) {
        print "incorrect number of fields on line $.\n";
        exit;
    }
    for my $f (0..$#fields) { # loop through all fields
        unless ($fields[$f] =~ $regexes[$f]) { # regex match
            print "invalid field on line $., field ", ($f+1), "\n";
            exit;
        }
    }
}

If you save the script as valid.pland make it executable chmod +x valid.pl, you can call it like ./valid.pl filename. Currently the script will exit as soon as the first problem is encountered. If you remove the exitstatements, it will list all of the problems with the file.

如果将脚本另存为valid.pl并使其可执行chmod +x valid.pl，则可以像./valid.pl filename. 目前脚本会在遇到第一个问题时立即退出。如果删除这些exit语句，它将列出该文件的所有问题。

In case you're unfamiliar with perl, $.is a special variable which contains the line number in the whileloop. $#fieldsis the value of the last index of the array @fields, so 0..$#fieldsis equivalent to the list 0,1,...,19. Array indices start at 0, so I've added 1 to the field number.

如果您不熟悉 perl，它$.是一个特殊变量，它包含while循环中的行号。$#fields是数组最后一个索引的值@fields，所以0..$#fields等价于列表 0,1,...,19。数组索引从 0 开始，因此我在字段编号中添加了 1。

bash 用于验证 csv 字段的 Shell 脚本

提问by odew

回答by anubhava

回答by Tom Fenech

相关推荐

最近更新

标签

bash 用于验证 csv 字段的 Shell 脚本

提问by odew

回答by anubhava

回答by Tom Fenech

相关推荐

npm adduser 通过 bash

bash 从bash中的文件读取的行中删除换行符

在 bash/sed/awk 中提取文件的最后一个字

bash 在 sed 替换中使用 $HOSTNAME

相关推荐

最近更新

标签