Linux 如何计算目录中所有文件中某个单词的出现次数？

Question

提问by Ashish Sharma

I'm trying to count a particular word occurrence in a whole directory. Is this possible?

我正在尝试计算整个目录中特定单词的出现次数。这可能吗？

Say for example there is a directory with 100 files all of whose files may have the word “aaa” in them. How would I count the number of “aaa” in all the files under that directory?

例如，假设有一个包含 100 个文件的目录，其中所有文件都可能包含“aaa”一词。我如何计算该目录下所有文件中“aaa”的数量？

I tried something like:

我试过类似的东西：

 zegrep "xception" `find . -name '*auth*application*' | wc -l

But it's not working.

但它不起作用。

Answer 1

采纳答案by Carlos Campderrós

grep -roh aaa . | wc -w

Grep recursively all files and directories in the current dir searching for aaa, and output only the matches, not the entire line. Then, just use wcto count how many words are there.

递归地搜索当前目录中的所有文件和目录，搜索 aaa，并只输出匹配项，而不是整行。然后，只需wc用来计算有多少单词。

Answer 2

回答by jcomeau_ictx

cat the files together and grep the output: cat $(find /usr/share/doc/ -name '*.txt') | zegrep -ic '\<exception\>'

将文件放在一起并 grep 输出： cat $(find /usr/share/doc/ -name '*.txt') | zegrep -ic '\<exception\>'

if you want 'exceptional' to match, don't use the '\<' and '\>' around the word.

如果您希望 'exceptional' 匹配，请不要在单词周围使用 '\<' 和 '\>'。

Answer 3

回答by paxdiablo

How about starting with:

如何开始：

cat * | sed 's/ /\n/g' | grep '^aaa$' | wc -l

as in the following transcript:

如以下记录：

pax$ cat file1
this is a file number 1

pax$ cat file2
And this file is file number 2,
a slightly larger file

pax$ cat file[12] | sed 's/ /\n/g' | grep 'file$' | wc -l
4

The sedconverts spaces to newlines (you may want to include otherspace characters as well such as tabs, with sed 's/[ \t]/\n/g'). The grepjust gets those lines that have the desired word, then the wccounts those lines for you.

该sed转换空格换行（可能要包括其他的空格字符以及诸如标签，其中sed 's/[ \t]/\n/g'）。在grep刚刚获得那些有希望的字线，则wc计算这些线为您服务。

Now there may be edge cases where this script doesn't work but it should be okay for the vast majority of situations.

现在可能存在此脚本不起作用的边缘情况，但对于绝大多数情况应该没问题。

If you wanted a whole tree(not just a single directory level), you can use somthing like:

如果你想要一个完整的树（不仅仅是一个目录级别），你可以使用类似的东西：

( find . -name '*.txt' -exec cat {} ';' ) | sed 's/ /\n/g' | grep '^aaa$' | wc -l

Answer 4

回答by Vijay

find .|xargs perl -p -e 's/ /\n'|xargs grep aaa|wc -l

Answer 5

回答by Fredrik Pihl

Another solution based on findand grep.

另一种基于find和的解决方案grep。

find . -type f -exec grep -o aaa {} \; | wc -l

Should correctly handle filenames with spaces in them.

应该正确处理包含空格的文件名。

Answer 6

回答by tim

There's also a grep regex syntax for matching words only:

还有一个仅用于匹配单词的 grep 正则表达式语法：

# based on Carlos Campderrós solution posted in this thread
man grep | less -p '\<'
grep -roh '\<aaa\>' . | wc -l

For a different word matching regex syntax see:

对于不同的单词匹配正则表达式语法，请参阅：

man re_format | less -p '\[\[:<:\]\]'

Answer 7

回答by Sheharyar

Let's use AWK!

让我们使用AWK！

$ function wordfrequency() { awk 'BEGIN { FS="[^a-zA-Z]+" } { for (i=1; i<=NF; i++) { word = tolower($i); words[word]++ } } END { for (w in words) printf("%3d %s\n", words[w], w) } ' | sort -rn; }
$ cat your_file.txt | wordfrequency

This lists the frequency of each word occurring in the provided file.If you want to see the occurrences of your word, you can just do this:

这列出了在提供的文件中出现的每个单词的频率。如果你想查看你的单词出现的次数，你可以这样做：

$ cat your_file.txt | wordfrequency | grep yourword

To find occurrences of your word across all files in a directory (non-recursively), you can do this:

要在目录中的所有文件中查找您的单词的出现次数（非递归），您可以执行以下操作：

$ cat * | wordfrequency | grep yourword

To find occurrences of your word across all files in a directory (and it's sub-directories), you can do this:

要在目录（及其子目录）中的所有文件中查找出现的单词，您可以执行以下操作：

$ find . -type f | xargs cat | wordfrequency | grep yourword

Source: AWK-ward Ruby

资料来源：AWK-ward Ruby

Answer 8

回答by Parag Tyagi -morpheus-

Use grepin its simplest way. Try grep --helpfor more info.

使用grep在其最简单的方式。尝试grep --help了解更多信息。

To get count of a word in a particular file:

grep -c <word> <file_name>

Example:

grep -c 'aaa' abc_report.csv

Output:

要获取特定文件中的单词数：

grep -c <word> <file_name>

例子：

grep -c 'aaa' abc_report.csv

输出：

To get count of a word in the whole directory:

grep -c -R <word>

Example:

grep -c -R 'aaa'

Output:

abc_report.csv:445
lmn_report.csv:129
pqr_report.csv:445
my_folder/xyz_report.csv:408

要获取整个目录中的单词数：

grep -c -R <word>

例子：

grep -c -R 'aaa'

输出：

abc_report.csv:445
lmn_report.csv:129
pqr_report.csv:445
my_folder/xyz_report.csv:408

Linux 如何计算目录中所有文件中某个单词的出现次数？

提问by Ashish Sharma

采纳答案by Carlos Campderrós

回答by jcomeau_ictx

回答by paxdiablo

回答by Vijay

回答by Fredrik Pihl

回答by tim

回答by Sheharyar

Let's use AWK!

让我们使用AWK！

回答by Parag Tyagi -morpheus-

相关推荐

最近更新

标签

Linux 如何计算目录中所有文件中某个单词的出现次数？

提问by Ashish Sharma

采纳答案by Carlos Campderrós

回答by jcomeau_ictx

回答by paxdiablo

回答by Vijay

回答by Fredrik Pihl

回答by tim

回答by Sheharyar

Let's use AWK!

让我们使用AWK！

回答by Parag Tyagi -morpheus-

相关推荐

用于将文件夹从本地计算机复制到远程服务器的 scp 命令语法

C#中的带宽限制

如何在 Linux 中更改 echo 的输出颜色

C#中Action委托的使用

相关推荐

最近更新

标签