bash unix - 文件中每列的最大值(长度)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8629973/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 01:13:03  来源:igfitidea点击:

unix - max(length) of each column in file

linuxbashshellunixawk

提问by toop

Given a file with data like this (ie stores.dat file)

给定一个包含这样数据的文件(即 stores.dat 文件)

sid|storeNo|latitude|longitude
2tt|1|-28.0372000t0|153.42921670
9|2t|-33tt.85t09t0000|15t1.03274200

Desired output:

期望的输出:

sid : 3
storeNo : 2
latitude : 16
longitude : 13

What is the syntax to return the maximum length of the values under each column?

返回每列下值的最大长度的语法是什么?

I have tried this but it does not work:

我试过这个,但它不起作用:

nawk 'BEGIN { FS = "|" }
{
for(n = 1; n <= NF; n++) {
if (length($n) > max)
max = length($n)
maxlen[$n] = max
}
}
END {
for (i in maxlen) print "col " i ": " maxlen[i]
} ' stores.dat

UPDATE (thanks to Mat's answer - I settled on this):

更新(感谢 Mat 的回答 - 我决定了):

awk -F"|" '  NR==1{
    for(n = 1; n <= NF; n++) {
       colname[n]=$n
    }
}
NR>1{
    for(n = 1; n <= NF; n++) {
        if (length($n)>maxlen[n])
            maxlen[n]=length($n)
    }
}
END {
        for (i in colname) {
                print colname[i], ":", maxlen[i]+0;
        }
} ' filename

回答by Mat

There's a few problems with your script - maxis shared between columns, and you're not dealing with the header line at all. Try the following:

您的脚本存在一些问题 -max在列之间共享,并且您根本没有处理标题行。请尝试以下操作:

$ cat t.awk 
#!/bin/awk -f
NR==1{
    for(n = 1; n <= NF; n++) {
       colname[n]=$n
    }
}
NR>1{
    for(n = 1; n <= NF; n++) {
        if (length($n)>maxlen[n])
            maxlen[n]=length($n)
    }
}
END {
        for (i in maxlen) {
                print colname[i], ":", maxlen[i];
        }
}
$ awk -F'|' -f t.awk stores.dat

$nrefers to the contents of the nth column. nis the column number (in the first and second loop). The last loop just shows a way of iterating over an array in awk.

$n指的是n第 th 列的内容。n是列号(在第一个和第二个循环中)。最后一个循环只是展示了一种在awk.

回答by Moreaki

My take on this is by using a pure Bash approach:

我对此的看法是使用纯 Bash 方法:

#!/usr/bin/env bash

dat=./stores.dat
del='|'
TOKENS=$(head -1 "${dat}" | tr $del ' ')
declare -a col=( $TOKENS )
declare -a max

skip=1
while IFS=$del read $TOKENS; do
    if [ $skip -eq 1 ]; then
        skip=0
        continue
    fi
    idx=0
    for tok in ${TOKENS}; do
        tokref=${!tok}
        printf "%-10s = %-16s[%2d] " "$tok" "${tokref}" "${#tokref}"
        echo "--> max=${max[$idx]} tokref=${#tokref}"
        #This works  : c=$a>$b?$a:$b
        #This doesn't: max[$idx]=${max[$idx]}>${#tokref}?${max[$idx]}:${#tokref}
        max[$idx]=$((${max[$idx]:=0}>${#tokref}?${max[$idx]}:${#tokref}))
        let idx++
    done
    printf "\n"
done < ${dat}

for ((idx=0; idx<${#col[@]}; idx++)); do
    printf "%-10s : %d\n" "${col[$idx]}" "${max[$idx]}"
done

The output is as follows:

输出如下:

sid        = 2tt             [ 3] --> max=0 tokref=3
storeNo    = 1               [ 1] --> max=0 tokref=1
latitude   = -28.0372000t0   [13] --> max=0 tokref=13
longitude  = 153.42921670    [12] --> max=0 tokref=12

sid        = 9               [ 1] --> max=3 tokref=1
storeNo    = 2t              [ 2] --> max=1 tokref=2
latitude   = -33tt.85t09t0000[16] --> max=13 tokref=16
longitude  = 15t1.03274200   [13] --> max=12 tokref=13

sid        : 3
storeNo    : 2
latitude   : 16
longitude  : 13

I've added this solution because I liked the challenge and had some minutes to spare.

我添加了这个解决方案是因为我喜欢这个挑战并且有几分钟的空闲时间。