Linux 计算管道分隔文件中的列数

Question

提问by Maulzey

I have a pipe |delimited file.

我有一个管道|分隔文件。

File:

文件：

106232145|"medicare"|"medicare,medicaid"|789

I would like to count the number of fields in each line. I tried the below code

我想计算每行中的字段数。我试过下面的代码

Code:

代码：

awk -F '|' '{print NF-1}'

This returns me the result as 5 instead of 4. This is because the awk takes "medicare|medicaid" as two different fields instead of one field

这将返回结果为 5 而不是 4。这是因为 awk 将“medicare|medicaid”作为两个不同的字段而不是一个字段

Answer 1

采纳答案by unxnut

awk -F\| '{print NF}'

gives correct result.

给出正确的结果。

Answer 2

回答by jaypal singh

For a |delimited file with embedded |in between this GNU awk v4.0or later should work:

对于|嵌入|在此GNU awk v4.0或更高版本之间的分隔文件应该可以工作：

gawk '{ print NF }' FPAT="([^|]+)|(\"[^\"]+\")"

Answer 3

回答by PP.

perl -ne 'print scalar( split( /\|/, $_ ) ) . "\n"' [filename]

perl -ne 'print scalar( split( /\|/, $_ ) ) . "\n"' [文档名称]

Answer 4

回答by DVK

Pure Unix solution (without awk/Perl):

纯 Unix 解决方案（没有 awk/Perl）：

$ cat  /tmp/x1
1|2|3|34
4534|23442|1121|334434

$ head -1 /tmp/x1 | tr "|" "2" | wc -l
4

Perl solution - 1-liner:

Perl 解决方案 - 1-liner：

$ perl5.8 -naF'\|' -e 'print scalar(@F)."\n";exit;' /tmp/x1
4

BUT!!!! IMPORTANT!!!

但！！！！重要的！！！

Every one of these solutions - as well as those on other answers - do NOT work 100%!

这些解决方案中的每一个 - 以及其他答案中的那些 - 都不能 100% 工作！

Namely, they all break when it's a REAL "pipe-separated" file, with a pipe being a valid character in the field (and the field being quoted), the way real CSV files work.

也就是说，当它是一个真正的“管道分隔”文件时，它们都会中断，管道是字段中的有效字符（并且该字段被引用），真正的 CSV 文件的工作方式。

E.g.

例如

$ cat /tmp/x2
"0|1"|2|3|34
4534|23442|1121|334434
$ perl5.8 -naF'\|' -e 'print scalar(@F)."\n";exit;' /tmp/x1
5   <----- BROKEN!!! There are only 4 fields, first field is "0|1"

To fix that, a proper CSV (or delimited file) parser should be used, such as one in Perl:

要解决这个问题，应该使用适当的 CSV（或分隔文件）解析器，例如 Perl 中的解析器：

$ perl5.8 -MText::CSV_XS 
-ne '$csv=Text::CSV_XS->new({sep_char => "|"});  $csv->parse($_); 
print $csv->fields(); print "\n"; exit;' /tmp/x2

Prints correct value

打印正确的值

As a note, simply fixing an awkor sedsolution with a convoluted RegEx won't work easily, since on top of pipe-containing-and-quoted PSV fields, the spec also allows quotesas part of the field as well. That does NOT lend itself to a nice RegEx solution.

需要注意的是，简单地使用复杂的 RegEx修复awkorsed解决方案并不容易，因为在包含管道和引用的 PSV 字段之上，规范还允许将引号作为字段的一部分。这不适合一个很好的 RegEx 解决方案。

Answer 5

回答by DVK

$ cat fieldparse.awk
#NR > 1 { print "--"; }

# Uncomment printf/print in the for loops to see
#   each field on a separate line as well as the commented line above (to show that it works).
{
    nfields = 0;
    for (i = 1; i <= NF; i++) {
        if ($i ~ /^".*[^"]$/)
            for (; i <= NF && ($i !~ /.*"$/); i++) {
                #printf("%s%s", $i, FS);
            }
        #print $i;
        nfields++;
    }
    print nfields;
    if (FILENAME == "-")
        FILENAME = "(standard input)";
    filenames[FILENAME] = sprintf("%d %d", FNR, nfields);
}

END {
    print NR, "total records processed";
    for (f in filenames) {
        split(filenames[f], fn, " ");
        printf("\t* %s: %d records with %d fields\n", f, fn[1], fn[2]);
    }
}

$ awk -F'|' -f fieldparse.awk demo.txt

It works for any single character separator that is NOT a double quotation mark, meaning standard tab delimited, CSV, etc. formats (as standard as they get anyway...)

它适用于任何不是双引号的单字符分隔符，这意味着标准制表符分隔、CSV 等格式（作为标准，无论如何......）

The output format is merely illustrative and a bit decorative at the end, but the content is still useful IMHO, such as handling multiple files. In any case, I hope it helps! :-)

输出格式只是说明性的，最后有点装饰，但恕我直言，内容仍然很有用，例如处理多个文件。无论如何，我希望它有所帮助！:-)

Edit

编辑

This was tested using mawk and GNU awk (gawk), the latter of which was tested in traditional, POSIX and the default modes. Trim the comments and output statements to find it actually a small program, though it isn't as small as one might like.

这是使用 mawk 和 GNU awk (gawk) 进行测试的，后者在传统、POSIX 和默认模式下进行了测试。修剪注释和输出语句以发现它实际上是一个小程序，尽管它并不像人们想象的那么小。

Linux 计算管道分隔文件中的列数

提问by Maulzey

采纳答案by unxnut

回答by jaypal singh

回答by PP.

回答by DVK

回答by DVK

相关推荐

最近更新

标签

Linux 计算管道分隔文件中的列数

提问by Maulzey

采纳答案by unxnut

回答by jaypal singh

回答by PP.

回答by DVK

回答by DVK

相关推荐

使用 C# 交叉线程设置标签的值

C# .NET 中的“开放泛型类型”究竟是什么？

C# 渲染时如何从WebControl中删除span标签

C# 为什么不能使用 null 作为 Dictionary<bool?, string> 的键？

相关推荐

最近更新

标签