bash 使用 shell 脚本处理处理制表符分隔的文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2781000/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 22:04:09  来源:igfitidea点击:

Processing a tab delimited file with shell script processing

bashshellscriptingsedawk

提问by Lilly Tooner

normally I would use Python/Perl for this procedure but I find myself (for political reasons) having to pull this off using a bash shell.

通常我会在这个过程中使用 Python/Perl,但我发现自己(出于原因)不得不使用 bash shell 来完成它。

I have a large tab delimited file that contains six columns and the second column is integers. I need to shell script a solution that would verify that the file indeed is six columns and that the second column is indeed integers. I am assuming that I would need to use sed/awk here somewhere. Problem is that I'm not that familiar with sed/awk. Any advice would be appreciated.

我有一个包含六列的大制表符分隔文件,第二列是整数。我需要编写一个解决方案来验证文件确实是六列并且第二列确实是整数。我假设我需要在某处使用 sed/awk。问题是我对 sed/awk 不太熟悉。任何意见,将不胜感激。

Many thanks! Lilly

非常感谢!礼来

采纳答案by Pointy

Well you can directly tell awkwhat the field delimiter is (the -F option). Inside your awkscript you can tell how many fields are present in each record with the NF variable.

好吧,您可以直接告诉awk字段分隔符是什么(-F 选项)。在awk脚本中,您可以使用 NF 变量告诉每个记录中存在多少个字段。

Oh, and you can check the second field with a regex. The whole thing might look something like this:

哦,您可以使用正则表达式检查第二个字段。整个事情可能看起来像这样:

awk < thefile -F\t '
{ if (NF != 6 ||  ~ /[^0123456789]/) print "Format error, line " NR; }
'

That's probably close but I need to check the regex because Linux regex syntax variation is so insane. (edited because grrrr)

这可能很接近,但我需要检查正则表达式,因为 Linux 正则表达式语法变化太疯狂了。(编辑因为 grrrr

回答by Ignacio Vazquez-Abrams

gawk:

呆呆:

BEGIN {
  FS="\t"
}

(NF != 6) || ( != int()) {
  exit 1
}

Invoke as follows:

调用如下:

if awk -f colcheck.awk somefile
then
  # is valid
else
  # is not valid
fi

回答by ghostdog74

here's how to do it with awk

这是使用 awk 的方法

awk 'NF!=6||+0!={print "error"}' file

回答by Fritz G. Mehner

Pure Bash:

纯重击:

infile='column6.dat'
lno=0

while read -a line ; do
  ((lno++))
  if [ ${#line[@]} -ne 6 ] ; then
    echo -e "line $lno has ${#line[@]} elements"
  fi
  if ! [[  ${line[1]} =~ ^[0-9]+$ ]] ; then
    echo -e "line $lno column  2 : not an integer"
  fi
done < "$infile"

Possible output:

可能的输出:

line 19 has 5 elements
line 36 column  2 : not an integer
line 38 column  2 : not an integer
line 51 has 3 elements