bash awk：在列中找到最小值和最大值

Question

提问by Wang Zong'an

I'm using awkto deal with a simple .dat file, which contains several lines of data and each line has 4 columns separated by a single space. I want to find the minimum and maximum of the first column.

我正在使用awk处理一个简单的 .dat 文件，该文件包含多行数据，每行有 4 列，由一个空格分隔。我想找到第一列的最小值和最大值。

The data file looks like this:

数据文件如下所示：

9 30 8.58939 167.759
9 38 1.3709 164.318
10 30 6.69505 169.529
10 31 7.05698 169.425
11 30 6.03872 169.095
11 31 5.5398 167.902
12 30 3.66257 168.689
12 31 9.6747 167.049
4 30 10.7602 169.611
4 31 8.25869 169.637
5 30 7.08504 170.212
5 31 11.5508 168.409
6 31 5.57599 168.903
6 32 6.37579 168.283
7 30 11.8416 168.538
7 31 -2.70843 167.116
8 30 47.1137 126.085
8 31 4.73017 169.496

The commands I used are as follows.

我使用的命令如下。

min=`awk 'BEGIN{a=1000}{if (<a) a= fi} END{print a}' mydata.dat`
max=`awk 'BEGIN{a=   0}{if (>a) a= fi} END{print a}' mydata.dat`

However, the output is min=10and max=9.

但是，输出是min=10和max=9。

(The similar commands can return me the right minimum and maximum of the second column.)

（类似的命令可以返回第二列的正确最小值和最大值。）

Could someone tell me where I was wrong? Thank you!

有人能告诉我我错在哪里吗？谢谢！

Answer 1

回答by Klaus Zeuge

Awk guesses the type.

awk 猜测类型。

String "10" is less than string "4" because character "1" comes before "4". Force a type conversation, using addition of zero:

字符串“10”小于字符串“4”，因为字符“1”在“4”之前。强制类型对话，使用加零：

min=`awk 'BEGIN{a=1000}{if (<0+a) a=} END{print a}' mydata.dat`
max=`awk 'BEGIN{a=   0}{if (>0+a) a=} END{print a}' mydata.dat`

Answer 2

回答by glenn Hymanman

a non-awk answer:

一个非 awk 的答案：

cut -d" " -f1 file |
sort -n |
tee >(echo "min=$(head -1)") \
  > >(echo "max=$(tail -1)")

That tee command is ~~perhaps a bit~~much too clever. tee duplicates its stdin stream to the files names as arguments, plus it streams the same data to stdout. I'm using process substitutions to filter the streams.

这个 tee 命令~~可能有点~~太聪明了。tee 将其 stdin 流复制到文件名作为参数，并将相同的数据流到 stdout。我正在使用进程替换来过滤流。

The same effect can be used (with less flourish) to extract the first and last lines of a stream of data:

可以使用相同的效果（较少使用）来提取数据流的第一行和最后一行：

cut -d" " -f1 file | sort -n | sed -n '1s/^/min=/p; $s/^/max=/p'

or

或者

cut -d" " -f1 file | sort -n | { 
    read line
    echo "min=$line"
    while read line; do max=$line; done
    echo "max=$max"
}

Answer 3

回答by Ed Morton

Your problem was simply that in your script you had:

你的问题很简单，在你的脚本中你有：

if (<a) a= fi

and that final fiis not part of awk syntax so it is treated as a variable so a=$1 fiis string concatenation and so you are TELLING awk that acontains a string, not a number and hence the string comparison instead of numeric in the $1<a.

并且 finalfi不是 awk 语法的一部分，因此它被视为变量，a=$1 fi字符串连接也是如此，因此您正在告诉 awka包含字符串，而不是数字，因此字符串比较而不是$1<a.

More importantly in general, never start with some guessed value for max/min, just use the first value read as the seed. Here's the correct way to write the script:

更重要的是，一般来说，永远不要从一些猜测的最大值/最小值开始，只需使用读取的第一个值作为种子。以下是编写脚本的正确方法：

$ cat tst.awk
BEGIN { min = max = "NaN" }
{
    min = (NR==1 || <min ?  : min)
    max = (NR==1 || >max ?  : max)
}
END { print min, max }

$ awk -f tst.awk file
4 12

$ awk -f tst.awk /dev/null
NaN NaN

$ a=( $( awk -f tst.awk file ) )
$ echo "${a[0]}"
4
$ echo "${a[1]}"
12

If you don't like NaNpick whatever you'd prefer to print when the input file is empty.

如果您不喜欢NaN在输入文件为空时选择您喜欢打印的任何内容。

Answer 4

回答by Hossein Vatani

late but a shorter command and with more precision without initial assumption:

较晚但更短的命令，更精确，无需初始假设：

  awk '(NR==1){Min=;Max=};(NR>=2){if(Min>) Min=;if(Max<) Max=} END {printf "The Min is %d ,Max is %d",Min,Max}' FileName.dat

bash awk：在列中找到最小值和最大值

提问by Wang Zong'an

回答by Klaus Zeuge

回答by glenn Hymanman

回答by Ed Morton

回答by Hossein Vatani

相关推荐

最近更新

标签

bash awk：在列中找到最小值和最大值

提问by Wang Zong'an

回答by Klaus Zeuge

回答by glenn Hymanman

回答by Ed Morton

回答by Hossein Vatani

相关推荐

wpf WrapPanel 作为 ItemsControl 的 ItemPanel

bash 数字的绝对值

WPF 工具提示可见性

bash $做什么？$0 $1 $2 在 shell 脚本中是什么意思？

相关推荐

最近更新

标签