使用 Bash 脚本计算均值、方差和范围

Question

提问by Brian James

Given a file file.txt:

给定一个文件 file.txt：

AAA 1 2 3 4 5 6 3 4 5 2 3 
BBB 3 2 3 34 56 1 
CCC 4 7 4 6 222 45

Does any one have any ideas on how to calculate the mean, variance and range for each item, i.e. AAA, BBB, CCC respectively using Bash script? Thanks.

有没有人对如何使用 Bash 脚本分别计算每个项目的均值、方差和范围有任何想法，即分别为 AAA、BBB、CCC？谢谢。

Answer 1

回答by Adam Liss

Here's a solution with awk, which calculates:

这是一个带有的解决方案awk，它计算：

minimum = smallest value on each line
maximum = largest value on each line
average = μ = sum of all values on each line, divided by the count of the numbers.
variance = 1/n × [(Σx)² - Σ(x²)] where
n = number of values on the line = NF- 1 (in awk, NF= number of fields on the line)
(Σx)² = square of the sum of the values on the line
Σ(x²) = sum of the squares of the values on the line

最小值 = 每行的最小值
最大值 = 每行的最大值
平均值 = μ = 每行上所有值的总和，除以数字的计数。
方差 = 1/n × [(Σx)² - Σ(x²)] 其中
n = 行上的值数 = NF- 1（在 awk 中，NF= 行上的字段数）
(Σx)² = 总和的平方
Σ(x²)线上值的总和 = 线上值的平方和

awk '{
  min = max = sum = ;       # Initialize to the first value (2nd field)
  sum2 =  *               # Running sum of squares
  for (n=3; n <= NF; n++) {   # Process each value on the line
    if ($n < min) min = $n    # Current minimum
    if ($n > max) max = $n    # Current maximum
    sum += $n;                # Running sum of values
    sum2 += $n * $n           # Running sum of squares
  }
  print  ": min=" min ", avg=" sum/(NF-1) ", max=" max ", var=" ((sum*sum) - sum2)/(NF-1);
}' filename

Output:

输出：

AAA: min=1, avg=3.45455, max=6, var=117.273
BBB: min=1, avg=16.5, max=56, var=914.333
CCC: min=4, avg=48, max=222, var=5253

Note that you can save the awk script (everything between, but not including, the single-quotes) in a file, say called script, and execute it with awk -f script filename

请注意，您可以将 awk 脚本（单引号之间的所有内容，但不包括单引号）保存在一个文件中，例如调用script，并使用awk -f script filename

Answer 2

回答by kev

You can use python:

您可以使用python：

$ AAA() {  echo "$@" | python -c 'from sys import stdin; nums = [float(i) for i in stdin.read().split()]; print(sum(nums)/len(nums))'; }

$ AAA 1 2 3 4 5 6 3 4 5 2 3
3.45454545455

Answer 3

回答by user unknown

Part 1 (mean):

第 1 部分（平均值）：

mean () {
  len=$#
  echo  $* | tr " " "\n" | sort -n | head -n $(((len+1)/2)) | tail -n 1
}

nMean () {
  echo -n " " 
  shift 
  mean $* 
}

mean usage:

平均用法：

nMean AAA 3 4  5 6 3 4 3 6 2 4
4

Part 2 (variance):

第 2 部分（差异）：

variance () {
  count=
  avg=
  shift
  shift
  sum=0
  for n in $* 
  do 
    diff=$((avg-n))
    quad=$((diff*diff))
    sum=$((sum+quad))
  done 
  echo $((sum/count)) 
}

sum () {
  form="$(echo $*)"
  formula=${form// /+}
  echo $((formula))
}

nVariance () {
  echo -n " " 
  shift 
  count=$#
  s=$(sum $*) 
  avg=$((s/$count))
  var=$(variance $count $avg $*)
  echo $var
}

usage:

用法：

nVariance AAA 3 4  5 6 3 4 3 6 2 4
1

Part 3 (range):

第 3 部分（范围）：

range () { 
  min=
  max=
  for p in $* ; do 
    (( $p < $min )) && min=$p
    (( $p > $max )) && max=$p
  done 
  echo $min ":" $max 
}

nRange () {
  echo -n " " 
  shift 
  range $* 
}

usage:

用法：

nRange AAA 1 2 3 4 5 6 3 4 5 2 3 
AAA 1 : 6

nX is short for named X, named mean, named variance, ... . Note, that I use integer arithmetic, which is, what is possible with the shell. To use floating point arithmetic, you would use bc, for instance. Here you loose precision, which might be acceptable for big natural numbers.

nX 是命名 X、命名平均值、命名方差、...的缩写。请注意，我使用整数算术，也就是说，shell 可以实现什么。例如，要使用浮点运算，您可以使用 bc。在这里，您失去了精度，这对于大自然数来说可能是可以接受的。

Process all 3 commands for an input line:

处理输入行的所有 3 个命令：

processLine () {
  nVariance $*
  nMean $*
  nRange $*
}

Read the data from a file, line by line:

从文件中逐行读取数据：

# data:
# AAA 1 2 3 4 5 6 3 4 5 2 3 
# BBB 3 2 3 34 56 1 
# CCC 4 7 4 6 222 45 

while read line
do
  processLine $line
done < data

update:

更新：

Contrary to my expectation, it doesn't seem easy to handle an unknown number of arguments with functions in bc, for example min (3, 4, 5, 2, 6).

与我的预期相反bc，例如，使用中的函数处理未知数量的参数似乎并不容易min (3, 4, 5, 2, 6)。

But the need to call bc can be reduced to 2 places, if the input are integers. I used a precision of 2 ("scale=2") - you may change this to your needs.

但是如果输入是整数，调用 bc 的需要可以减少到 2 个地方。我使用了 2 的精度（“scale=2”） - 您可以根据需要更改它。

variance () {
  count=
  avg=
  shift
  shift
  sum=0
  for n in $* 
  do 
    diff="($avg-$n)"
    quad="($diff*$diff)"
    sum="($sum+$quad)"
  done 
#  echo "$sum/$count" 
  echo "scale=2;$sum/$count" | bc 
}

nVariance () {
  echo -n " " 
  shift 
  count=$#
  s=$(sum $*) 
  avg=$(echo "scale=2;$s/$count" | bc)
  var=$(variance $count $avg $*)
  echo $var
}

The rest of the code can stay the same. Please verify that the formula for the variance is correct - I used what I had in mind:

其余代码可以保持不变。请验证方差的公式是否正确 - 我使用了我的想法：

For values (1, 5, 9), I sum up (15) divide by count (3) => 5. Then I create the diff to the avg for each value (-4, 0, 4), build the square (16, 0, 16), sum them up (32) and divide by count (3) => 10.66

对于值 (1, 5, 9)，我总结 (15) 除以计数 (3) => 5。然后我创建每个值 (-4, 0, 4) 的平均值的差异，构建平方 ( 16, 0, 16)，将它们相加 (32) 并除以计数 (3) => 10.66

Is this correct, or do I need a square root somewhere ;) ?

这是正确的，还是我需要在某处使用平方根 ;) ？

Note, that I had to correct the mean calculation. For 1, 5, 9, the mean is 5, not 1 - am I right? It now uses sort -n(numeric) and (len+1)/2.

请注意，我必须更正平均值计算。对于 1、5、9，平均值是 5，而不是 1 - 我说得对吗？它现在使用sort -n（数字）和(len+1)/2.

Answer 4

回答by pbot

There is a typo in the accepted answer that causes the variance to be miscalculated. In the printstatement:

接受的答案中有一个错字，导致方差计算错误。在print声明中：

", var=" ((sum*sum) - sum2)/(NF-1)

should be:

应该：

", var=" (sum2 - ((sum*sum)/NF))/(NF-1)

Also, it is better to use something like Welford's algorithmto calculate variance; the algorithm in the accepted answer is unstable when the variance is small relative to the mean:

另外，最好使用类似Welford 算法的东西来计算方差；当方差相对于均值较小时，接受答案中的算法不稳定：

    foo="1 2 3 4 5 6 3 4 5 2 3";
    awk '{
      M = 0;
      S = 0;
      for (k=1; k <= NF; k++) { 
        x = $k;
        oldM = M;
        M = M + ((x - M)/k);
        S = S + (x - M)*(x - oldM);
      }
      var = S/(NF - 1);
      print " var=" var;
    }' <<< $foo

使用 Bash 脚本计算均值、方差和范围

提问by Brian James

回答by Adam Liss

回答by kev

回答by user unknown

update:

更新：

回答by pbot

相关推荐

最近更新

标签

使用 Bash 脚本计算均值、方差和范围

提问by Brian James

回答by Adam Liss

回答by kev

回答by user unknown

update:

更新：

回答by pbot

相关推荐

bash 读取由行号指定的两行之间的行

使用 Bash 删除文本文件中的列？

bash 从 GitHub 删除文件夹

bash Cygwin gitk 问题

相关推荐

最近更新

标签