bash 计算文件中令牌出现的次数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/128365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 20:29:11  来源:igfitidea点击:

Count number of occurrences of token in a file

bashshellgrep

提问by matt b

I have a server access log, with timestamps of each http request, I'd like to obtain a count of the number of requests at each second. Using sed, and cut -c, so far I've managed to cut the file down to just the timestamps, such as:

我有一个服务器访问日志,带有每个 http 请求的时间戳,我想获得每秒请求数的计数。使用sed, 和cut -c,到目前为止,我已经设法将文件缩减为时间戳,例如:

22-Sep-2008 20:00:21 +0000
22-Sep-2008 20:00:22 +0000
22-Sep-2008 20:00:22 +0000
22-Sep-2008 20:00:22 +0000
22-Sep-2008 20:00:24 +0000
22-Sep-2008 20:00:24 +0000

22九月2008 20点00分21秒0000
22九月2008 0000二十时00分22秒
22九月2008 0000二十时00分22秒
22九月2008 0000二十时00分22秒
22- 2008 年
9 月 20:00:24 +0000 2008 年 9 月 22 日 20:00:24 +0000

What I'd love to get is the number of times each unique timestamp appears in the file. For example, with the above example, I'd like to get output that looks like:

我想得到的是每个唯一时间戳在文件中出现的次数。例如,在上面的例子中,我想得到如下所示的输出:

22-Sep-2008 20:00:21 +0000: 1
22-Sep-2008 20:00:22 +0000: 3
22-Sep-2008 20:00:24 +0000: 2

2008 年 9 月 22 日 20:00:21 +0000:1
2008 年 9 月 22 日 20:00:22 +0000:3
2008 年 9 月 22 日 20:00:24 +0000:2

I've used sort -uto filter the list of timestamps down to a list of unique tokens, hoping that I could use grep like

我曾经sort -u将时间戳列表过滤为唯一标记列表,希望我可以使用 grep 之类的

grep -c -f <file containing patterns> <file>

but this just produces a single line of a grand total of matching lines.

但这只会产生一行匹配行的总数。

I know this can be done in a single line, stringing a few utilities together ... but I can't think of which. Anyone know?

我知道这可以在一行中完成,将几个实用程序串在一起……但我想不出哪个。有人知道吗?

回答by The Archetypal Paul

I think you're looking for

我想你正在寻找

uniq --count

-c, --count prefix lines by the number of occurrences

-c, --count 前缀行按出现次数

回答by David

Using AWK with associative arrays might be another solution to something like this.

将 AWK 与关联数组一起使用可能是解决此类问题的另一种解决方案。

回答by Remo.D

Just in case you want the output in the format you originally specified (with the number of occurences at the end):

以防万一您希望以您最初指定的格式输出(最后出现的次数):

uniq -c logfile | sed 's/\([0-9]+\)\(.*\)/: /'

回答by Tom

Using awk:

使用awk

cat file.txt | awk '{count[ " " ]++;} \
                    END {for(w in count){print w ": " count[w]};}'

回答by Bity

Tom's solution:

汤姆的解决方案:

awk '{count[ " " ]++;} END {for(w in count){print w ": " count[w]};}' file.txt

works more generally.

更一般地工作。

My file was not sorted :

我的文件没有排序:

name1 
name2 
name3 
name2 
name2 
name3 
name1

Therefore the occurrences weren't following each other, and uniqdoes not work as it gives :

因此,事件并没有相互跟踪,并且uniq无法正常工作:

1 name1 
1 name2 
1 name3 
2 name2 
1 name3 
1 name1

With the awk script however:

然而,使用 awk 脚本:

name1:2 
name2:3 
name3:2

回答by Clyde

maybe use xargs? Can't put it all together in my head on the spot here, but use xargs on your sort -u so that for each unique second you can grep the original file and do a wc -l to get the number.

也许使用xargs?不能在这里当场把它放在我的脑海里,但是在你的 sort -u 上使用 xargs ,这样对于每一秒你都可以 grep 原始文件并执行 wc -l 来获取数字。