bash 用于验证 csv 字段的 Shell 脚本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24419220/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 10:44:42  来源:igfitidea点击:

Shell script to validate csv fields

regexbashshellunixawk

提问by odew

I have a csv file with 20 fields. I want to have a script to check if the file is valid according with the following points:

我有一个包含 20 个字段的 csv 文件。我想要一个脚本来根据以下几点检查文件是否有效:

  • It needs to have 20 fields separated by pipes.
  • Each of the 20 fields should match a regex.
  • Know the line and field number for any regex unmatch.
  • 它需要有 20 个由管道分隔的字段。
  • 20 个字段中的每一个都应匹配一个正则表达式。
  • 知道任何正则表达式不匹配的行号和字段号。

ex:

前任:

f1|f2|f3|...|f20
1|aaaa|Y|...|2014/06/25
2|bb|Y...|2014/06/25
3|ccc|N...|2014/06/25

regex:
f1 [0-9]
f2 [a-z]{2,4}
f3 [YN]
.
.
.
f20 [1-9][0-9][0-9][0-9]-[0-1][0-9]-[0-3][0-9]

What are the best shell tools to do it? Do you have any similar script?

什么是最好的 shell 工具?有没有类似的剧本?

回答by anubhava

Best tool in Unix systems is awkfor this job. You can use an awk command like this:

Unix 系统中最好的工具就是awk做这个工作的。您可以像这样使用 awk 命令:

awk 'BEGIN{FS=OFS="|"} NF!=20{print "not enough fields"; exit}
!(~/^[0-9]$/) {print "1st field invalid"; exit}' file.csv

回答by Tom Fenech

You might want to consider using a perl script for this:

您可能需要考虑为此使用 perl 脚本:

#!/usr/bin/env perl

use strict;
use warnings;

my @regexes = (
    qr/\d/,                  # regex quotes qr/ /
    qr/[a-z]{2,4}/, 
    qr/[YN]/,
    #etc. put the rest of the regexes here
);

while (<>) {                 # loop through every line of file
    my @fields = split /\|/; # split on pipe, needs escaping
    if (@fields != 20) {
        print "incorrect number of fields on line $.\n";
        exit;
    }
    for my $f (0..$#fields) { # loop through all fields
        unless ($fields[$f] =~ $regexes[$f]) { # regex match
            print "invalid field on line $., field ", ($f+1), "\n";
            exit;
        }
    }
}

If you save the script as valid.pland make it executable chmod +x valid.pl, you can call it like ./valid.pl filename. Currently the script will exit as soon as the first problem is encountered. If you remove the exitstatements, it will list all of the problems with the file.

如果将脚本另存为valid.pl并使其可执行chmod +x valid.pl,则可以像./valid.pl filename. 目前脚本会在遇到第一个问题时立即退出。如果删除这些exit语句,它将列出该文件的所有问题。

In case you're unfamiliar with perl, $.is a special variable which contains the line number in the whileloop. $#fieldsis the value of the last index of the array @fields, so 0..$#fieldsis equivalent to the list 0,1,...,19. Array indices start at 0, so I've added 1 to the field number.

如果您不熟悉 perl,它$.是一个特殊变量,它包含while循环中的行号。$#fields是数组最后一个索引的值@fields,所以0..$#fields等价于列表 0,1,...,19。数组索引从 0 开始,因此我在字段编号中添加了 1。