bash 使用 awk 查找列的平均值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19149731/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Use awk to find average of a column
提问by Ben Zifkin
I'm attempting to find the average of the second column of data using awk
for a class. This is my current code, with the framework my instructor provided:
我试图找到awk
用于类的第二列数据的平均值。这是我当前的代码,我的导师提供了框架:
#!/bin/awk
### This script currently prints the total number of rows processed.
### You must edit this script to print the average of the 2nd column
### instead of the number of rows.
# This block of code is executed for each line in the file
{
x=sum
read name
awk 'BEGIN{sum+=}'
# The script should NOT print out a value for each line
}
# The END block is processed after the last line is read
END {
# NR is a variable equal to the number of rows in the file
print "Average: " sum/ NR
# Change this to print the Average instead of just the number of rows
}
and I'm getting an error that says:
我收到一条错误消息:
awk: avg.awk:11: awk 'BEGIN{sum+=}' $name
awk: avg.awk:11: ^ invalid char ''' in expression
I think I'm close but I really have no idea where to go from here. The code shouldn't be incredibly complex as everything we've seen in class has been fairly basic. Please let me know.
我想我已经很近了,但我真的不知道从哪里开始。代码不应该非常复杂,因为我们在课堂上看到的一切都是相当基本的。请告诉我。
采纳答案by imp25
Your specific error is with line 11:
您的具体错误是第 11 行:
awk 'BEGIN{sum+=}'
This is a line where awk
is invoked, and its BEGIN
block is specified - but you are already within a awk script, so you do not need to specify awk
. Also you want to run sum+=$2
on each line of input, so you do not want it within a BEGIN
block. Hence the line should simply read:
这是awk
被调用的一行,并BEGIN
指定了它的块 - 但您已经在一个 awk 脚本中,因此您不需要指定awk
. 您还希望sum+=$2
在输入的每一行上运行,因此您不希望它在一个BEGIN
块中。因此,该行应该简单地阅读:
sum+=
You also do not need the lines:
您也不需要以下行:
x=sum
read name
the first just creates a synonym to sum
named x
and I'm not sure what the second does, but neither are needed.
第一个只是为sum
named创建一个同义词x
,我不确定第二个是做什么的,但两者都不需要。
This would make your awk script:
这将使您的 awk 脚本:
#!/bin/awk
### This script currently prints the total number of rows processed.
### You must edit this script to print the average of the 2nd column
### instead of the number of rows.
# This block of code is executed for each line in the file
{
sum+=
# The script should NOT print out a value for each line
}
# The END block is processed after the last line is read
END {
# NR is a variable equal to the number of rows in the file
print "Average: " sum/ NR
# Change this to print the Average instead of just the number of rows
}
Jonathan Leffler's answer gives the awk one liner which represents the same fixed code, with the addition of checking that there are at least 1 lines of input (this stops any divide by zero error). If
Jonathan Leffler 的回答为 awk 提供了一个代表相同固定代码的行,并检查了至少有 1 行输入(这会阻止任何除以零错误)。如果
回答by Jonathan Leffler
awk '{ sum += ; n++ } END { if (n > 0) print sum / n; }'
Add the numbers in $2
(second column) in sum
(variables are auto-initialized to zero by awk
) and increment the number of rows (which could also be handled via built-in variable NR). At the end, if there was at least one value read, print the average.
添加$2
(第二列)中的数字sum
(变量通过 自动初始化为零awk
)并增加行数(也可以通过内置变量 NR 处理)。最后,如果至少读取了一个值,则打印平均值。
awk '{ sum += } END { if (NR > 0) print sum / NR }'
If you want to use the shebang notation, you could write:
如果你想使用shebang符号,你可以写:
#!/bin/awk
{ sum += }
END { if (NR > 0) print sum / NR }
You can also control the format of the average with printf()
and a suitable format ("%13.6e\n"
, for example).
您还可以printf()
使用合适的格式("%13.6e\n"
例如)控制平均值的格式。
You can also generalize the code to average the Nth column (with N=2
in this sample) using:
您还可以使用以下方法概括代码以平均第 N 列(N=2
在本示例中):
awk -v N=2 '{ sum += $N } END { if (NR > 0) print sum / NR }'
回答by Pradipta
Try this:
尝试这个:
ls -l | awk -F : '{sum+=} END {print "AVG=",sum/NR}'
NR is an AWK builtin variable to count the no. of records
NR 是一个 AWK 内置变量,用于计算编号。记录数
回答by iamauser
awk 's+={print s/NR}' table | tail -1
I am using tail -1
to print the last line which should have the average number...
我tail -1
用来打印最后一行应该有平均数字......