Linux 计算管道分隔文件中的列数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17558462/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Count number of column in a pipe delimited file
提问by Maulzey
I have a pipe |
delimited file.
我有一个管道|
分隔文件。
File:
文件:
106232145|"medicare"|"medicare,medicaid"|789
I would like to count the number of fields in each line. I tried the below code
我想计算每行中的字段数。我试过下面的代码
Code:
代码:
awk -F '|' '{print NF-1}'
This returns me the result as 5 instead of 4. This is because the awk takes "medicare|medicaid" as two different fields instead of one field
这将返回结果为 5 而不是 4。这是因为 awk 将“medicare|medicaid”作为两个不同的字段而不是一个字段
采纳答案by unxnut
awk -F\| '{print NF}'
gives correct result.
给出正确的结果。
回答by jaypal singh
For a |
delimited file with embedded |
in between this GNU awk v4.0
or later should work:
对于|
嵌入|
在此GNU awk v4.0
或更高版本之间的分隔文件应该可以工作:
gawk '{ print NF }' FPAT="([^|]+)|(\"[^\"]+\")"
回答by PP.
perl -ne 'print scalar( split( /\|/, $_ ) ) . "\n"'
[filename]
perl -ne 'print scalar( split( /\|/, $_ ) ) . "\n"'
[文档名称]
回答by DVK
Pure Unix solution (without awk/Perl):
纯 Unix 解决方案(没有 awk/Perl):
$ cat /tmp/x1
1|2|3|34
4534|23442|1121|334434
$ head -1 /tmp/x1 | tr "|" "2" | wc -l
4
Perl solution - 1-liner:
Perl 解决方案 - 1-liner:
$ perl5.8 -naF'\|' -e 'print scalar(@F)."\n";exit;' /tmp/x1
4
BUT!!!! IMPORTANT!!!
但!!!!重要的!!!
Every one of these solutions - as well as those on other answers - do NOT work 100%!
这些解决方案中的每一个 - 以及其他答案中的那些 - 都不能 100% 工作!
Namely, they all break when it's a REAL "pipe-separated" file, with a pipe being a valid character in the field (and the field being quoted), the way real CSV files work.
也就是说,当它是一个真正的“管道分隔”文件时,它们都会中断,管道是字段中的有效字符(并且该字段被引用),真正的 CSV 文件的工作方式。
E.g.
例如
$ cat /tmp/x2
"0|1"|2|3|34
4534|23442|1121|334434
$ perl5.8 -naF'\|' -e 'print scalar(@F)."\n";exit;' /tmp/x1
5 <----- BROKEN!!! There are only 4 fields, first field is "0|1"
To fix that, a proper CSV (or delimited file) parser should be used, such as one in Perl:
要解决这个问题,应该使用适当的 CSV(或分隔文件)解析器,例如 Perl 中的解析器:
$ perl5.8 -MText::CSV_XS
-ne '$csv=Text::CSV_XS->new({sep_char => "|"}); $csv->parse($_);
print $csv->fields(); print "\n"; exit;' /tmp/x2
Prints correct value
打印正确的值
4
As a note, simply fixing an awk
or sed
solution with a convoluted RegEx won't work easily, since on top of pipe-containing-and-quoted PSV fields, the spec also allows quotesas part of the field as well. That does NOT lend itself to a nice RegEx solution.
需要注意的是,简单地使用复杂的 RegEx修复awk
orsed
解决方案并不容易,因为在包含管道和引用的 PSV 字段之上,规范还允许将引号作为字段的一部分。这不适合一个很好的 RegEx 解决方案。
回答by DVK
$ cat fieldparse.awk
#NR > 1 { print "--"; }
# Uncomment printf/print in the for loops to see
# each field on a separate line as well as the commented line above (to show that it works).
{
nfields = 0;
for (i = 1; i <= NF; i++) {
if ($i ~ /^".*[^"]$/)
for (; i <= NF && ($i !~ /.*"$/); i++) {
#printf("%s%s", $i, FS);
}
#print $i;
nfields++;
}
print nfields;
if (FILENAME == "-")
FILENAME = "(standard input)";
filenames[FILENAME] = sprintf("%d %d", FNR, nfields);
}
END {
print NR, "total records processed";
for (f in filenames) {
split(filenames[f], fn, " ");
printf("\t* %s: %d records with %d fields\n", f, fn[1], fn[2]);
}
}
$ awk -F'|' -f fieldparse.awk demo.txt
It works for any single character separator that is NOT a double quotation mark, meaning standard tab delimited, CSV, etc. formats (as standard as they get anyway...)
它适用于任何不是双引号的单字符分隔符,这意味着标准制表符分隔、CSV 等格式(作为标准,无论如何......)
The output format is merely illustrative and a bit decorative at the end, but the content is still useful IMHO, such as handling multiple files. In any case, I hope it helps! :-)
输出格式只是说明性的,最后有点装饰,但恕我直言,内容仍然很有用,例如处理多个文件。无论如何,我希望它有所帮助!:-)
Edit
编辑
This was tested using mawk and GNU awk (gawk), the latter of which was tested in traditional, POSIX and the default modes. Trim the comments and output statements to find it actually a small program, though it isn't as small as one might like.
这是使用 mawk 和 GNU awk (gawk) 进行测试的,后者在传统、POSIX 和默认模式下进行了测试。修剪注释和输出语句以发现它实际上是一个小程序,尽管它并不像人们想象的那么小。