bash 如何grep唯一出现的次数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18752443/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to grep number of unique occurrences
提问by Simpleton
I understand that grep -c string
can be used to count the occurrences of a given string. What I would like to do is count the number of unique occurrences when only part of the string is known or remains constant.
我知道grep -c string
可以用来计算给定字符串的出现次数。我想要做的是在只有部分字符串已知或保持不变时计算唯一出现的次数。
For Example, if I had a file (in this case a log) with several lines containing a constant string and a repeating variable like so:
例如,如果我有一个文件(在这种情况下是一个日志),其中有几行包含一个常量字符串和一个重复变量,如下所示:
string=value1
string=value1
string=value1
string=value2
string=value3
string=value2
Than I would like to be able to identify the number of each unique set with an output similar to the following: (ideally with a single grep/awk string)
比我希望能够使用类似于以下的输出来识别每个唯一集的数量:(理想情况下使用单个 grep/awk 字符串)
value1 = 3 occurrences
value2 = 2 occurrences
value3 = 1 occurrences
Does anyone have a solution using grep or awk that might work? Thanks in advance!
有没有人有使用 grep 或 awk 可能有效的解决方案?提前致谢!
回答by Simpleton
This worked perfectly... Thanks to everyone for your comments!
这非常有效......感谢大家的评论!
grep -oP "wwn=[^,]*" path/to/file | sort | uniq -c
grep -oP "wwn=[^,]*" path/to/file | sort | uniq -c
回答by fedorqui 'SO stop harming'
In general, if you want to grep and also keep track of results, it is best to use awk
since it performs such things in a clear manner with a very simple syntax.
一般来说,如果您想 grep 并跟踪结果,最好使用awk
它,因为它以非常简单的语法以清晰的方式执行此类操作。
So for your given file I would use:
因此,对于您给定的文件,我将使用:
$ awk -F= '/string=/ {count[]++} END {for (i in count) print i, count[i]}' file
value1 3
value2 2
value3 1
What is this doing?
这是在做什么?
-F=
set the field separator to=
, so that we can compute the right and left part of it./string=/ {count[$2]++}
when the pattern "string=" is found, check it! This uses an arraycount[]
to keep track on the times the second field has appeared so far.END {for (i in count) print i, count[i]}
at the end, loop through the results and print them.
-F=
将字段分隔符设置为=
,以便我们可以计算它的左右部分。/string=/ {count[$2]++}
当找到模式“string=”时,检查它!这使用一个数组count[]
来跟踪到目前为止第二个字段出现的时间。END {for (i in count) print i, count[i]}
最后,循环遍历结果并打印出来。
回答by konsolebox
Here's an awk script:
这是一个 awk 脚本:
#!/usr/bin/awk -f
BEGIN {
file = ARGV[1]
while ((getline line < file) > 0) {
for (i = 2; i < ARGC; ++i) {
p = ARGV[i]
if (line ~ p) {
a[p] += !a[p, line]++
}
}
}
for (i = 2; i < ARGC; ++i) {
p = ARGV[i]
printf("%s = %d occurrences\n", p, a[p])
}
exit
}
Example:
例子:
awk -f script.awk somefile ab sh
Output:
输出:
ab = 7 occurrences
sh = 2 occurrences