bash 使用 grep 计算某个单词在文件中重复的次数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/21054875/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 09:13:38  来源:igfitidea点击:

use grep to count the number of times a word got repeated in a file

bashshellgrep

提问by linbianxiaocao

The problem is like this:

问题是这样的:

For instance, I have a file "a.xml". Inside this file it is just one line as

例如,我有一个文件“a.xml”。在这个文件中,它只有一行

<queue><item><cause><item>

I want to find how many times <item>occurs, and in this case it is 2.

我想找出<item>发生了多少次,在这种情况下是 2。

However, if I run:

但是,如果我运行:

grep -c "<item>" a.xml 

It will only give me 1 because grep stops as soon as it matches the first <item>.

它只会给我 1 因为 grep 只要匹配第一个就停止<item>

So my problem is how do I use a simple shell/bash command that returns the number of times <item>occurs?

所以我的问题是如何使用一个简单的 shell/bash 命令来返回<item>发生的次数?

It looks simple but I just cannot find a good way around. Any ideas?

它看起来很简单,但我找不到好的方法。有任何想法吗?

回答by MillaresRoo

You may try something like:

您可以尝试以下操作:

grep -o "<item>" a.xml | wc -l

回答by anubhava

Using awk you can do that in a single command:

使用 awk,您可以在单个命令中执行此操作:

awk -F '<item>' '{print NF-1}' a.xml

Online Demo: http://ideone.com/vheDgq

在线演示:http: //ideone.com/vheDgq

OR to get total count for whole file use:

或获取整个文件使用的总数:

awk -F '<item>' '{s+=NF-1}END{print s}' a.xml

回答by John1024

If you are just looking to count '< item>' alone, then I like MillaresRoo's grep -osolution. If you are looking to count items more generally, then consider:

如果您只是想单独计算 '< item>',那么我喜欢 MillaresRoo 的grep -o解决方案。如果您希望更广泛地计算项目,请考虑:

$ sed 's/></>\n</g' a.xml | sort | uniq -c
      1 <cause>
      2 <item>
      1 <queue>

Or, showing the input explicitly on the command line:

或者,在命令行上显式显示输入:

$ echo '<queue><item><cause><item>' | sed 's/></>\n</g' | sort | uniq -c
      1 <cause>
      2 <item>
      1 <queue>