bash awk:在列中找到最小值和最大值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29783990/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
awk: find minimum and maximum in column
提问by Wang Zong'an
I'm using awkto deal with a simple .dat file, which contains several lines of data and each line has 4 columns separated by a single space. I want to find the minimum and maximum of the first column.
我正在使用awk处理一个简单的 .dat 文件,该文件包含多行数据,每行有 4 列,由一个空格分隔。我想找到第一列的最小值和最大值。
The data file looks like this:
数据文件如下所示:
9 30 8.58939 167.759
9 38 1.3709 164.318
10 30 6.69505 169.529
10 31 7.05698 169.425
11 30 6.03872 169.095
11 31 5.5398 167.902
12 30 3.66257 168.689
12 31 9.6747 167.049
4 30 10.7602 169.611
4 31 8.25869 169.637
5 30 7.08504 170.212
5 31 11.5508 168.409
6 31 5.57599 168.903
6 32 6.37579 168.283
7 30 11.8416 168.538
7 31 -2.70843 167.116
8 30 47.1137 126.085
8 31 4.73017 169.496
The commands I used are as follows.
我使用的命令如下。
min=`awk 'BEGIN{a=1000}{if (<a) a= fi} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if (>a) a= fi} END{print a}' mydata.dat`
However, the output is min=10and max=9.
但是,输出是min=10和max=9。
(The similar commands can return me the right minimum and maximum of the second column.)
(类似的命令可以返回第二列的正确最小值和最大值。)
Could someone tell me where I was wrong? Thank you!
有人能告诉我我错在哪里吗?谢谢!
回答by Klaus Zeuge
Awk guesses the type.
awk 猜测类型。
String "10" is less than string "4" because character "1" comes before "4". Force a type conversation, using addition of zero:
字符串“10”小于字符串“4”,因为字符“1”在“4”之前。强制类型对话,使用加零:
min=`awk 'BEGIN{a=1000}{if (<0+a) a=} END{print a}' mydata.dat`
max=`awk 'BEGIN{a= 0}{if (>0+a) a=} END{print a}' mydata.dat`
回答by glenn Hymanman
a non-awk answer:
一个非 awk 的答案:
cut -d" " -f1 file |
sort -n |
tee >(echo "min=$(head -1)") \
> >(echo "max=$(tail -1)")
That tee command is perhaps a bitmuch too clever. tee duplicates its stdin stream to the files names as arguments, plus it streams the same data to stdout. I'm using process substitutions to filter the streams.
这个 tee 命令可能有点太聪明了。tee 将其 stdin 流复制到文件名作为参数,并将相同的数据流到 stdout。我正在使用进程替换来过滤流。
The same effect can be used (with less flourish) to extract the first and last lines of a stream of data:
可以使用相同的效果(较少使用)来提取数据流的第一行和最后一行:
cut -d" " -f1 file | sort -n | sed -n '1s/^/min=/p; $s/^/max=/p'
or
或者
cut -d" " -f1 file | sort -n | {
read line
echo "min=$line"
while read line; do max=$line; done
echo "max=$max"
}
回答by Ed Morton
Your problem was simply that in your script you had:
你的问题很简单,在你的脚本中你有:
if (<a) a= fi
and that final fi
is not part of awk syntax so it is treated as a variable so a=$1 fi
is string concatenation and so you are TELLING awk that a
contains a string, not a number and hence the string comparison instead of numeric in the $1<a
.
并且 finalfi
不是 awk 语法的一部分,因此它被视为变量,a=$1 fi
字符串连接也是如此,因此您正在告诉 awka
包含字符串,而不是数字,因此字符串比较而不是$1<a
.
More importantly in general, never start with some guessed value for max/min, just use the first value read as the seed. Here's the correct way to write the script:
更重要的是,一般来说,永远不要从一些猜测的最大值/最小值开始,只需使用读取的第一个值作为种子。以下是编写脚本的正确方法:
$ cat tst.awk
BEGIN { min = max = "NaN" }
{
min = (NR==1 || <min ? : min)
max = (NR==1 || >max ? : max)
}
END { print min, max }
$ awk -f tst.awk file
4 12
$ awk -f tst.awk /dev/null
NaN NaN
$ a=( $( awk -f tst.awk file ) )
$ echo "${a[0]}"
4
$ echo "${a[1]}"
12
If you don't like NaN
pick whatever you'd prefer to print when the input file is empty.
如果您不喜欢NaN
在输入文件为空时选择您喜欢打印的任何内容。
回答by Hossein Vatani
late but a shorter command and with more precision without initial assumption:
较晚但更短的命令,更精确,无需初始假设:
awk '(NR==1){Min=;Max=};(NR>=2){if(Min>) Min=;if(Max<) Max=} END {printf "The Min is %d ,Max is %d",Min,Max}' FileName.dat