bash 如何递归计算目录中的单词数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35559648/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I count the number of words in a directory recursively?
提问by Alistair Colling
I'm trying to calculate the number of words written in a project. There are a few levels of folders and lots of text files within them.
我正在尝试计算项目中编写的字数。有几个级别的文件夹和许多文本文件。
Can anyone help me find out a quick way to do this?
谁能帮我找出一个快速的方法来做到这一点?
bash or vim would be good!
bash 或 vim 会很好!
Thanks
谢谢
回答by karakfa
use find
the scan the dir tree and wc
will do the rest
使用find
扫描目录树并wc
完成剩下的工作
$ find path -type f | xargs wc -w | tail -1
last line gives the totals.
最后一行给出了总数。
回答by janos
You could find and print all the content and pipe to wc
:
您可以找到并打印所有内容和管道wc
:
find path -type f -exec cat {} \; -exec echo \; | wc -w
Note: the -exec echo \;
is needed in case a file doesn't end with a newline character, in which case the last word of one file and the first word of the next will not be separated.
注意:-exec echo \;
如果文件不以换行符结尾,则需要使用 ,在这种情况下,一个文件的最后一个单词和下一个文件的第一个单词将不会被分隔。
Or you could find and wc
and use awk to aggregate the counts:
或者你可以找到并wc
使用 awk 来聚合计数:
find . -type f -exec wc -w {} \; | awk '{ sum += } END { print sum }'
回答by rubicks
tldr;
tldr;
$ find . -type f -exec wc -w {} + | awk '/total/{print }' | paste -sd+ | bc
Explanation:
解释:
The find . -type f -exec wc -w {} +
will run wc -w
on all the files (recursively) contained by .
(the current working directory). find
will execute wc
as few times as possible but as many times as is necessaryto comply with ARG_MAX
--- the system command length limit. When the quantity of files (and/or their constituent lengths) exceeds ARG_MAX
, then find
invokes wc -w
more than once, giving multiple total
lines:
在find . -type f -exec wc -w {} +
将运行wc -w
上的所有文件(递归)包含由.
(当前工作目录)。find
将执行wc
尽可能少的次数,但根据需要执行尽可能多的次数以符合ARG_MAX
--- 系统命令长度限制。当文件的数量(和/或它们的组成长度)超过 时ARG_MAX
,find
调用wc -w
不止一次,给出多total
行:
$ find . -type f -exec wc -w {} + | awk '/total/{print $ find . -type f -exec wc -w {} + | awk '/total/{print }'
8264577
654892
1109527
149522
174922
181897
1229726
2305504
1196390
5509702
9886665
}'
8264577 total
654892 total
1109527 total
149522 total
174922 total
181897 total
1229726 total
2305504 total
1196390 total
5509702 total
9886665 total
Isolate these partial sums by printing only the first whitespace-delimited field of each total
line:
通过仅打印每total
行的第一个以空格分隔的字段来隔离这些部分和:
$ find . -type f -exec wc -w {} + | awk '/total/{print }' | paste -sd+
8264577+654892+1109527+149522+174922+181897+1229726+2305504+1196390+5509702+9886665
paste
the partial sums with a +
delimiter to give an infix summation:
paste
带有+
定界符的部分总和给出中缀总和:
$ find . -type f -exec wc -w {} + | awk '/total/{print }' | paste -sd+ | bc
30663324
Evaluate the infix summation using bc
, which supports both infix expressions and arbitrary precision:
使用 评估中缀bc
和,它支持中缀表达式和任意精度:
#!/usr/bin/env bash
shopt -s globstar
count=0
for f in **/*.txt
do
words=$(wc -w "$f" | awk '{print }')
count=$(($count + $words))
done
echo $count
References:
参考:
回答by miken32
回答by Yeikel
Assuming you don't need to recursively count the words and that you want to include all the files in the current directory , you can use a simple approach such as:
假设您不需要递归计算单词并且您希望包含当前目录中的所有文件,您可以使用一种简单的方法,例如:
cat *.txt | wc -l
If you want to count the words for only a specific extension in the current directory , you could try :
如果您只想计算当前目录中特定扩展名的字数,您可以尝试:
##代码##