bash awk 列的中位数

Question

提问by Nick

How can I use AWK to compute the median of a column of numerical data?

如何使用 AWK 计算一列数值数据的中位数？

I can think of a simple algorithm but I can't seem to program it:

我能想到一个简单的算法，但我似乎无法对其进行编程：

What I have so far is:

到目前为止我所拥有的是：

sort | awk 'END{print NR}'

And this gives me the number of elements in the column. I'd like to use this to print a certain row (NR/2). If NR/2is not an integer, then I round up to the nearest integer and that is the median, otherwise I take the average of (NR/2)+1and (NR/2)-1.

这给了我列中元素的数量。我想用它来打印某一行(NR/2)。如果NR/2不是整数，然后我四舍五入到最接近的整数，这是中位数，否则我取平均值的(NR/2)+1和(NR/2)-1。

Answer 1

回答by maxschlepzig

With awkyou have to store the values in an array and compute the median at the end, assuming we look at the first column:

随着awk你要的值存储在数组中，并计算在最后的中位数，假设我们看看第一列：

sort -n file | awk ' { a[i++]=; } END { print a[int(i/2)]; }'

Sure, for real median computation do the rounding as described in the question:

当然，对于真正的中位数计算，按照问题中的描述进行四舍五入：

sort -n file | awk ' { a[i++]=; }
    END { x=int((i+1)/2); if (x < (i+1)/2) print (a[x-1]+a[x])/2; else print a[x-1]; }'

Answer 2

回答by Johnsyweb

This awkprogram assumes one column of numerically sorted data:

该awk程序假设一列按数字排序的数据：

#/usr/bin/env awk
{
    count[NR] = ;
}
END {
    if (NR % 2) {
        print count[(NR + 1) / 2];
    } else {
        print (count[(NR / 2)] + count[(NR / 2) + 1]) / 2.0;
    }
}

Sample usage:

示例用法：

sort -n data_file | awk -f median.awk

Answer 3

回答by Vinicius Placco

OK, just saw this topic and thought I could add my two cents, since I looked for something similar in the past. Even though the title says awk, all the answers make use of sortas well. Calculating the median for a column of data can be easily accomplished with datamash:

好吧，刚看到这个话题，我想我可以加上我的两分钱，因为我过去寻找过类似的东西。即使标题说awk，所有的答案也使用sort。使用datamash可以轻松完成计算一列数据的中位数：

> seq 10 | datamash median 1
5.5

Note that sortis not needed, even if you have an unsorted column:

请注意sort，即使您有未排序的列，也不需要：

> seq 10 | gshuf | datamash median 1
5.5

The documentation gives all the functions it can perform, and good examples as well for files with many columns. Anyway, it has nothing to do with awk, but I think datamashis of great help in cases like this, and could also be used in conjunction with awk. Hope it helps somebody!

该文档提供了它可以执行的所有功能，以及具有许多列的文件的好示例。无论如何，它与无关awk，但我认为datamash在这种情况下有很大帮助，也可以与awk. 希望它可以帮助某人！

Answer 4

回答by Brad Parks

This AWK based answerto a similar question on unix.stackexchange.com gives the same results as Excel for calculating the median.

这个基于 AWK 的对 unix.stackexchange.com 上类似问题的回答给出了与 Excel 相同的计算中位数的结果。

Answer 5

回答by arenaq

If you have an array to compute median from (contains one-liner of Johnsyweb solution):

如果您有一个数组来计算中位数（包含单行 Johnsyweb 解决方案）：

array=(5 6 4 2 7 9 3 1 8) # numbers 1-9
IFS=$'\n'
median=$(awk '{arr[NR]=} END {if (NR%2==1) print arr[(NR+1)/2]; else print (arr[NR/2]+arr[NR/2+1])/2}' <<< sort <<< "${array[*]}")
unset IFS

bash awk 列的中位数

提问by Nick

回答by maxschlepzig

回答by Johnsyweb

回答by Vinicius Placco

回答by Brad Parks

回答by arenaq

相关推荐

最近更新

标签

bash awk 列的中位数

提问by Nick

回答by maxschlepzig

回答by Johnsyweb

回答by Vinicius Placco

回答by Brad Parks

回答by arenaq

相关推荐

bash 在bash中重复打印一个字符

带空格的 Bash 变量

bash 语法错误：反引号替换中的 EOF

bash 如何将 Git 的分支名称添加到提交消息中？

相关推荐

最近更新

标签