bash 使用 awk 查找列的平均值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/19149731/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 00:17:12  来源:igfitidea点击:

Use awk to find average of a column

bashawk

提问by Ben Zifkin

I'm attempting to find the average of the second column of data using awkfor a class. This is my current code, with the framework my instructor provided:

我试图找到awk用于类的第二列数据的平均值。这是我当前的代码,我的导师提供了框架:

#!/bin/awk

### This script currently prints the total number of rows processed.
### You must edit this script to print the average of the 2nd column
### instead of the number of rows.

# This block of code is executed for each line in the file
{
x=sum
read name
        awk 'BEGIN{sum+=}'
        # The script should NOT print out a value for each line
}
# The END block is processed after the last line is read
END {
        # NR is a variable equal to the number of rows in the file
        print "Average: " sum/ NR
        # Change this to print the Average instead of just the number of rows
}

and I'm getting an error that says:

我收到一条错误消息:

awk: avg.awk:11:        awk 'BEGIN{sum+=}' $name
awk: avg.awk:11:            ^ invalid char ''' in expression

I think I'm close but I really have no idea where to go from here. The code shouldn't be incredibly complex as everything we've seen in class has been fairly basic. Please let me know.

我想我已经很近了,但我真的不知道从哪里开始。代码不应该非常复杂,因为我们在课堂上看到的一切都是相当基本的。请告诉我。

采纳答案by imp25

Your specific error is with line 11:

您的具体错误是第 11 行:

awk 'BEGIN{sum+=}'

This is a line where awkis invoked, and its BEGINblock is specified - but you are already within a awk script, so you do not need to specify awk. Also you want to run sum+=$2on each line of input, so you do not want it within a BEGINblock. Hence the line should simply read:

这是awk被调用的一行,并BEGIN指定了它的块 - 但您已经在一个 awk 脚本中,因此您不需要指定awk. 您还希望sum+=$2在输入的每一行上运行,因此您不希望它在一个BEGIN块中。因此,该行应该简单地阅读:

sum+=

You also do not need the lines:

您也不需要以下行:

x=sum
read name

the first just creates a synonym to sumnamed xand I'm not sure what the second does, but neither are needed.

第一个只是为sumnamed创建一个同义词x,我不确定第二个是做什么的,但两者都不需要。

This would make your awk script:

这将使您的 awk 脚本:

#!/bin/awk

### This script currently prints the total number of rows processed.
### You must edit this script to print the average of the 2nd column
### instead of the number of rows.

# This block of code is executed for each line in the file
{
    sum+=
    # The script should NOT print out a value for each line
}
# The END block is processed after the last line is read
END {
    # NR is a variable equal to the number of rows in the file
    print "Average: " sum/ NR
    # Change this to print the Average instead of just the number of rows
}

Jonathan Leffler's answer gives the awk one liner which represents the same fixed code, with the addition of checking that there are at least 1 lines of input (this stops any divide by zero error). If

Jonathan Leffler 的回答为 awk 提供了一个代表相同固定代码的行,并检查了至少有 1 行输入(这会阻止任何除以零错误)。如果

回答by Jonathan Leffler

awk '{ sum += ; n++ } END { if (n > 0) print sum / n; }'

Add the numbers in $2(second column) in sum(variables are auto-initialized to zero by awk) and increment the number of rows (which could also be handled via built-in variable NR). At the end, if there was at least one value read, print the average.

添加$2(第二列)中的数字sum(变量通过 自动初始化为零awk)并增加行数(也可以通过内置变量 NR 处理)。最后,如果至少读取了一个值,则打印平均值。

awk '{ sum +=  } END { if (NR > 0) print sum / NR }'

If you want to use the shebang notation, you could write:

如果你想使用shebang符号,你可以写:

#!/bin/awk

{ sum +=  }
END { if (NR > 0) print sum / NR }

You can also control the format of the average with printf()and a suitable format ("%13.6e\n", for example).

您还可以printf()使用合适的格式("%13.6e\n"例如)控制平均值的格式。

You can also generalize the code to average the Nth column (with N=2in this sample) using:

您还可以使用以下方法概括代码以平均第 N 列(N=2在本示例中):

awk -v N=2 '{ sum += $N } END { if (NR > 0) print sum / NR }'

回答by Pradipta

Try this:

尝试这个:

ls -l  | awk -F : '{sum+=} END {print "AVG=",sum/NR}'

NR is an AWK builtin variable to count the no. of records

NR 是一个 AWK 内置变量,用于计算编号。记录数

回答by iamauser

awk 's+={print s/NR}' table | tail -1

I am using tail -1to print the last line which should have the average number...

tail -1用来打印最后一行应该有平均数字......