bash 如何grep唯一出现的次数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18752443/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 00:10:50  来源:igfitidea点击:

How to grep number of unique occurrences

bashawkgrep

提问by Simpleton

I understand that grep -c stringcan be used to count the occurrences of a given string. What I would like to do is count the number of unique occurrences when only part of the string is known or remains constant.

我知道grep -c string可以用来计算给定字符串的出现次数。我想要做的是在只有部分字符串已知或保持不变时计算唯一出现的次数。

For Example, if I had a file (in this case a log) with several lines containing a constant string and a repeating variable like so:

例如,如果我有一个文件(在这种情况下是一个日志),其中有几行包含一个常量字符串和一个重复变量,如下所示:

string=value1
string=value1
string=value1
string=value2
string=value3
string=value2

Than I would like to be able to identify the number of each unique set with an output similar to the following: (ideally with a single grep/awk string)

比我希望能够使用类似于以下的输出来识别每个唯一集的数量:(理想情况下使用单个 grep/awk 字符串)

value1 = 3 occurrences
value2 = 2 occurrences
value3 = 1 occurrences

Does anyone have a solution using grep or awk that might work? Thanks in advance!

有没有人有使用 grep 或 awk 可能有效的解决方案?提前致谢!

回答by Simpleton

This worked perfectly... Thanks to everyone for your comments!

这非常有效......感谢大家的评论!

grep -oP "wwn=[^,]*" path/to/file | sort | uniq -c

grep -oP "wwn=[^,]*" path/to/file | sort | uniq -c

回答by fedorqui 'SO stop harming'

In general, if you want to grep and also keep track of results, it is best to use awksince it performs such things in a clear manner with a very simple syntax.

一般来说,如果您想 grep 并跟踪结果,最好使用awk它,因为它以非常简单的语法以清晰的方式执行此类操作。

So for your given file I would use:

因此,对于您给定的文件,我将使用:

$ awk -F= '/string=/ {count[]++} END {for (i in count) print i, count[i]}' file
value1 3
value2 2
value3 1

What is this doing?

这是在做什么?

  • -F=
    set the field separator to =, so that we can compute the right and left part of it.
  • /string=/ {count[$2]++}
    when the pattern "string=" is found, check it! This uses an array count[]to keep track on the times the second field has appeared so far.
  • END {for (i in count) print i, count[i]}
    at the end, loop through the results and print them.
  • -F=
    将字段分隔符设置为=,以便我们可以计算它的左右部分。
  • /string=/ {count[$2]++}
    当找到模式“string=”时,检查它!这使用一个数组count[]来跟踪到目前为止第二个字段出现的时间。
  • END {for (i in count) print i, count[i]}
    最后,循环遍历结果并打印出来。

回答by konsolebox

Here's an awk script:

这是一个 awk 脚本:

#!/usr/bin/awk -f

BEGIN {
    file = ARGV[1]
    while ((getline line < file) > 0) {
        for (i = 2; i < ARGC; ++i) {
            p = ARGV[i]
            if (line ~ p) {
                a[p] += !a[p, line]++
            }
        }
    }
    for (i = 2; i < ARGC; ++i) {
        p = ARGV[i]
        printf("%s = %d occurrences\n", p, a[p])
    }
    exit
}

Example:

例子:

awk -f script.awk somefile ab sh

Output:

输出:

ab = 7 occurrences
sh = 2 occurrences