Linux 如何计算目录中所有文件中某个单词的出现次数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6135065/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to count occurrences of a word in all the files of a directory?
提问by Ashish Sharma
I'm trying to count a particular word occurrence in a whole directory. Is this possible?
我正在尝试计算整个目录中特定单词的出现次数。这可能吗?
Say for example there is a directory with 100 files all of whose files may have the word “aaa” in them. How would I count the number of “aaa” in all the files under that directory?
例如,假设有一个包含 100 个文件的目录,其中所有文件都可能包含“aaa”一词。我如何计算该目录下所有文件中“aaa”的数量?
I tried something like:
我试过类似的东西:
zegrep "xception" `find . -name '*auth*application*' | wc -l
But it's not working.
但它不起作用。
采纳答案by Carlos Campderrós
grep -roh aaa . | wc -w
grep -roh aaa . | wc -w
Grep recursively all files and directories in the current dir searching for aaa, and output only the matches, not the entire line. Then, just use wc
to count how many words are there.
递归地搜索当前目录中的所有文件和目录,搜索 aaa,并只输出匹配项,而不是整行。然后,只需wc
用来计算有多少单词。
回答by jcomeau_ictx
cat the files together and grep the output: cat $(find /usr/share/doc/ -name '*.txt') | zegrep -ic '\<exception\>'
将文件放在一起并 grep 输出: cat $(find /usr/share/doc/ -name '*.txt') | zegrep -ic '\<exception\>'
if you want 'exceptional' to match, don't use the '\<' and '\>' around the word.
如果您希望 'exceptional' 匹配,请不要在单词周围使用 '\<' 和 '\>'。
回答by paxdiablo
How about starting with:
如何开始:
cat * | sed 's/ /\n/g' | grep '^aaa$' | wc -l
as in the following transcript:
如以下记录:
pax$ cat file1
this is a file number 1
pax$ cat file2
And this file is file number 2,
a slightly larger file
pax$ cat file[12] | sed 's/ /\n/g' | grep 'file$' | wc -l
4
The sed
converts spaces to newlines (you may want to include otherspace characters as well such as tabs, with sed 's/[ \t]/\n/g'
). The grep
just gets those lines that have the desired word, then the wc
counts those lines for you.
该sed
转换空格换行(可能要包括其他的空格字符以及诸如标签,其中sed 's/[ \t]/\n/g'
)。在grep
刚刚获得那些有希望的字线,则wc
计算这些线为您服务。
Now there may be edge cases where this script doesn't work but it should be okay for the vast majority of situations.
现在可能存在此脚本不起作用的边缘情况,但对于绝大多数情况应该没问题。
If you wanted a whole tree(not just a single directory level), you can use somthing like:
如果你想要一个完整的树(不仅仅是一个目录级别),你可以使用类似的东西:
( find . -name '*.txt' -exec cat {} ';' ) | sed 's/ /\n/g' | grep '^aaa$' | wc -l
回答by Vijay
find .|xargs perl -p -e 's/ /\n'|xargs grep aaa|wc -l
回答by Fredrik Pihl
Another solution based on find
and grep
.
另一种基于find
和的解决方案grep
。
find . -type f -exec grep -o aaa {} \; | wc -l
Should correctly handle filenames with spaces in them.
应该正确处理包含空格的文件名。
回答by tim
There's also a grep regex syntax for matching words only:
还有一个仅用于匹配单词的 grep 正则表达式语法:
# based on Carlos Campderrós solution posted in this thread
man grep | less -p '\<'
grep -roh '\<aaa\>' . | wc -l
For a different word matching regex syntax see:
对于不同的单词匹配正则表达式语法,请参阅:
man re_format | less -p '\[\[:<:\]\]'
回答by Sheharyar
Let's use AWK!
让我们使用AWK!
$ function wordfrequency() { awk 'BEGIN { FS="[^a-zA-Z]+" } { for (i=1; i<=NF; i++) { word = tolower($i); words[word]++ } } END { for (w in words) printf("%3d %s\n", words[w], w) } ' | sort -rn; }
$ cat your_file.txt | wordfrequency
This lists the frequency of each word occurring in the provided file.If you want to see the occurrences of your word, you can just do this:
这列出了在提供的文件中出现的每个单词的频率。如果你想查看你的单词出现的次数,你可以这样做:
$ cat your_file.txt | wordfrequency | grep yourword
To find occurrences of your word across all files in a directory (non-recursively), you can do this:
要在目录中的所有文件中查找您的单词的出现次数(非递归),您可以执行以下操作:
$ cat * | wordfrequency | grep yourword
To find occurrences of your word across all files in a directory (and it's sub-directories), you can do this:
要在目录(及其子目录)中的所有文件中查找出现的单词,您可以执行以下操作:
$ find . -type f | xargs cat | wordfrequency | grep yourword
Source: AWK-ward Ruby
资料来源:AWK-ward Ruby
回答by Parag Tyagi -morpheus-
Use grep
in its simplest way. Try grep --help
for more info.
使用grep
在其最简单的方式。尝试grep --help
了解更多信息。
To get count of a word in a particular file:
grep -c <word> <file_name>
Example:
grep -c 'aaa' abc_report.csv
Output:
445
要获取特定文件中的单词数:
grep -c <word> <file_name>
例子:
grep -c 'aaa' abc_report.csv
输出:
445
To get count of a word in the whole directory:
grep -c -R <word>
Example:
grep -c -R 'aaa'
Output:
abc_report.csv:445 lmn_report.csv:129 pqr_report.csv:445 my_folder/xyz_report.csv:408
要获取整个目录中的单词数:
grep -c -R <word>
例子:
grep -c -R 'aaa'
输出:
abc_report.csv:445 lmn_report.csv:129 pqr_report.csv:445 my_folder/xyz_report.csv:408