Linux 如何计算目录中所有文件中某个单词的出现次数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6135065/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 00:57:20  来源:igfitidea点击:

How to count occurrences of a word in all the files of a directory?

linuxunixcountfindgrep

提问by Ashish Sharma

I'm trying to count a particular word occurrence in a whole directory. Is this possible?

我正在尝试计算整个目录中特定单词的出现次数。这可能吗?

Say for example there is a directory with 100 files all of whose files may have the word “aaa” in them. How would I count the number of “aaa” in all the files under that directory?

例如,假设有一个包含 100 个文件的目录,其中所有文件都可能包含“aaa”一词。我如何计算该目录下所有文件中“aaa”的数量?

I tried something like:

我试过类似的东西:

 zegrep "xception" `find . -name '*auth*application*' | wc -l 

But it's not working.

但它不起作用。

采纳答案by Carlos Campderrós

grep -roh aaa . | wc -w

grep -roh aaa . | wc -w

Grep recursively all files and directories in the current dir searching for aaa, and output only the matches, not the entire line. Then, just use wcto count how many words are there.

递归地搜索当前目录中的所有文件和目录,搜索 aaa,并只输出匹配项,而不是整行。然后,只需wc用来计算有多少单词。

回答by jcomeau_ictx

cat the files together and grep the output: cat $(find /usr/share/doc/ -name '*.txt') | zegrep -ic '\<exception\>'

将文件放在一起并 grep 输出: cat $(find /usr/share/doc/ -name '*.txt') | zegrep -ic '\<exception\>'

if you want 'exceptional' to match, don't use the '\<' and '\>' around the word.

如果您希望 'exceptional' 匹配,请不要在单词周围使用 '\<' 和 '\>'。

回答by paxdiablo

How about starting with:

如何开始:

cat * | sed 's/ /\n/g' | grep '^aaa$' | wc -l

as in the following transcript:

如以下记录:

pax$ cat file1
this is a file number 1

pax$ cat file2
And this file is file number 2,
a slightly larger file

pax$ cat file[12] | sed 's/ /\n/g' | grep 'file$' | wc -l
4

The sedconverts spaces to newlines (you may want to include otherspace characters as well such as tabs, with sed 's/[ \t]/\n/g'). The grepjust gets those lines that have the desired word, then the wccounts those lines for you.

sed转换空格换行(可能要包括其他的空格字符以及诸如标签,其中sed 's/[ \t]/\n/g')。在grep刚刚获得那些有希望的字线,则wc计算这些线为您服务。

Now there may be edge cases where this script doesn't work but it should be okay for the vast majority of situations.

现在可能存在此脚本不起作用的边缘情况,但对于绝大多数情况应该没问题。

If you wanted a whole tree(not just a single directory level), you can use somthing like:

如果你想要一个完整的(不仅仅是一个目录级别),你可以使用类似的东西:

( find . -name '*.txt' -exec cat {} ';' ) | sed 's/ /\n/g' | grep '^aaa$' | wc -l

回答by Vijay

find .|xargs perl -p -e 's/ /\n'|xargs grep aaa|wc -l

回答by Fredrik Pihl

Another solution based on findand grep.

另一种基于find和的解决方案grep

find . -type f -exec grep -o aaa {} \; | wc -l

Should correctly handle filenames with spaces in them.

应该正确处理包含空格的文件名。

回答by tim

There's also a grep regex syntax for matching words only:

还有一个仅用于匹配单词的 grep 正则表达式语法:

# based on Carlos Campderrós solution posted in this thread
man grep | less -p '\<'
grep -roh '\<aaa\>' . | wc -l

For a different word matching regex syntax see:

对于不同的单词匹配正则表达式语法,请参阅:

man re_format | less -p '\[\[:<:\]\]'

回答by Sheharyar

Let's use AWK!

让我们使用AWK!

$ function wordfrequency() { awk 'BEGIN { FS="[^a-zA-Z]+" } { for (i=1; i<=NF; i++) { word = tolower($i); words[word]++ } } END { for (w in words) printf("%3d %s\n", words[w], w) } ' | sort -rn; }
$ cat your_file.txt | wordfrequency

This lists the frequency of each word occurring in the provided file.If you want to see the occurrences of your word, you can just do this:

这列出了在提供的文件中出现的每个单词的频率。如果你想查看你的单词出现的次数,你可以这样做:

$ cat your_file.txt | wordfrequency | grep yourword

To find occurrences of your word across all files in a directory (non-recursively), you can do this:

要在目录中的所有文件中查找您的单词的出现次数(非递归),您可以执行以下操作:

$ cat * | wordfrequency | grep yourword

To find occurrences of your word across all files in a directory (and it's sub-directories), you can do this:

要在目录(及其子目录)中的所有文件中查找出现的单词,您可以执行以下操作:

$ find . -type f | xargs cat | wordfrequency | grep yourword

Source: AWK-ward Ruby

资料来源:AWK-ward Ruby

回答by Parag Tyagi -morpheus-

Use grepin its simplest way. Try grep --helpfor more info.

使用grep在其最简单的方式。尝试grep --help了解更多信息。



  1. To get count of a word in a particular file:

    grep -c <word> <file_name>
    

    Example:

    grep -c 'aaa' abc_report.csv
    

    Output:

    445
    
  1. 要获取特定文件中的单词数:

    grep -c <word> <file_name>
    

    例子:

    grep -c 'aaa' abc_report.csv
    

    输出:

    445
    


  1. To get count of a word in the whole directory:

    grep -c -R <word>
    

    Example:

    grep -c -R 'aaa'
    

    Output:

    abc_report.csv:445
    lmn_report.csv:129
    pqr_report.csv:445
    my_folder/xyz_report.csv:408
    
  1. 要获取整个目录中的单词数:

    grep -c -R <word>
    

    例子:

    grep -c -R 'aaa'
    

    输出:

    abc_report.csv:445
    lmn_report.csv:129
    pqr_report.csv:445
    my_folder/xyz_report.csv:408