bash 使用命令行工具按排序顺序计算重复项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1092405/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-09 18:18:49  来源:igfitidea点击:

counting duplicates in a sorted sequence using command line tools

bashcommand-linesortingcountduplicates

提问by letronje

I have a command (cmd1) that greps through a log file to filter out a set of numbers. The numbers are in random order, so I use sort -gr to get a reverse sorted list of numbers. There may be duplicates within this sorted list. I need to find the count for each unique number in that list.

我有一个命令 (cmd1),它通过日志文件 grep 过滤掉一组数字。这些数字的顺序是随机的,所以我使用 sort -gr 来获得一个反向排序的数字列表。此排序列表中可能存在重复项。我需要找到该列表中每个唯一数字的计数。

For e.g. if the output of cmd1 is:

例如,如果 cmd1 的输出是:

100 
100 
100 
99 
99 
26 
25 
24 
24

I need another command that I can pipe the above output to, so that, I get:

我需要另一个命令,我可以将上面的输出通过管道传输到,这样,我得到:

100     3
99      2
26      1
25      1
24      2

回答by Stephen Paul Lesniewski

how about;

怎么样;

$ echo "100 100 100 99 99 26 25 24 24" \
    | tr " " "\n" \
    | sort \
    | uniq -c \
    | sort -k2nr \
    | awk '{printf("%s\t%s\n",,)}END{print}'

The result is :

结果是:

100 3
99  2
26  1
25  1
24  2

回答by Ibrahim

uniq -cworks for GNU uniq 8.23 at least, and does exactly what you want (assuming sorted input).

uniq -c至少适用于 GNU uniq 8.23,并且完全符合您的要求(假设已排序输入)。

回答by ghostdog74

if order is not important

如果顺序不重要

# echo "100 100 100 99 99 26 25 24 24" | awk '{for(i=1;i<=NF;i++)a[$i]++}END{for(o in a) printf "%s %s ",o,a[o]}'
26 1 100 3 99 2 24 2 25 1

回答by ericcurtin

Numerically sort the numbers in reverse, then count the duplicates, then swap the left and the right words. Align into columns.

对数字进行反向排序,然后计算重复项,然后交换左右单词。对齐成列。

printf '%d\n' 100 99 26 25 100 24 100 24 99 \
   | sort -nr | uniq -c | awk '{printf "%-8s%s\n", , }'
100     3
99      2
26      1
25      1
24      2

回答by Toby Speight

In Bash, we can use an associative arrayto count instances of each input value. Assuming we have the command $cmd1, e.g.

在 Bash 中,我们可以使用关联数组来计算每个输入值的实例。假设我们有命令$cmd1,例如

#!/bin/bash

cmd1='printf %d\n 100 99 26 25 100 24 100 24 99'

Then we can count values in the array variable ausing the ++mathematical operator on the relevant array entries:

然后我们可以a使用++相关数组条目上的数学运算符计算数组变量中的值:

while read i
do
    ((++a["$i"]))
done < <($cmd1)

We can print the resulting values:

我们可以打印结果值:

for i in "${!a[@]}"
do
    echo "$i ${a[$i]}"
done


If the order of output is important, we might need an external sortof the keys:

如果输出顺序很重要,我们可能需要一个外部sort键:

for i in $(printf '%s\n' "${!a[@]}" | sort -nr)
do
    echo "$i ${a[$i]}"
done