bash 如何递归计算目录中的单词数？

Question

提问by Alistair Colling

I'm trying to calculate the number of words written in a project. There are a few levels of folders and lots of text files within them.

我正在尝试计算项目中编写的字数。有几个级别的文件夹和许多文本文件。

Can anyone help me find out a quick way to do this?

谁能帮我找出一个快速的方法来做到这一点？

bash or vim would be good!

bash 或 vim 会很好！

Thanks

谢谢

Answer 1

回答by karakfa

use findthe scan the dir tree and wcwill do the rest

使用find扫描目录树并wc完成剩下的工作

$ find path -type f | xargs wc -w | tail -1

last line gives the totals.

最后一行给出了总数。

Answer 2

回答by janos

You could find and print all the content and pipe to wc:

您可以找到并打印所有内容和管道wc：

find path -type f -exec cat {} \; -exec echo \; | wc -w

Note: the -exec echo \;is needed in case a file doesn't end with a newline character, in which case the last word of one file and the first word of the next will not be separated.

注意：-exec echo \;如果文件不以换行符结尾，则需要使用，在这种情况下，一个文件的最后一个单词和下一个文件的第一个单词将不会被分隔。

Or you could find and wcand use awk to aggregate the counts:

或者你可以找到并wc使用 awk 来聚合计数：

find . -type f -exec wc -w {} \; | awk '{ sum +=  } END { print sum }'

Answer 3

回答by rubicks

tldr;

$ find . -type f -exec wc -w {} + | awk '/total/{print }' | paste -sd+ | bc

Explanation:

解释：

The find . -type f -exec wc -w {} +will run wc -won all the files (recursively) contained by .(the current working directory). findwill execute wcas few times as possible but as many times as is necessaryto comply with ARG_MAX--- the system command length limit. When the quantity of files (and/or their constituent lengths) exceeds ARG_MAX, then findinvokes wc -wmore than once, giving multiple totallines:

在find . -type f -exec wc -w {} +将运行wc -w上的所有文件（递归）包含由.（当前工作目录）。find将执行wc尽可能少的次数，但根据需要执行尽可能多的次数以符合ARG_MAX--- 系统命令长度限制。当文件的数量（和/或它们的组成长度）超过时ARG_MAX，find调用wc -w不止一次，给出多total行：

$ find . -type f -exec wc -w {} + | awk '/total/{print $ find . -type f -exec wc -w {} + | awk '/total/{print }'
8264577
654892
1109527
149522
174922
181897
1229726
2305504
1196390
5509702
9886665
}'
  8264577 total
  654892 total
 1109527 total
 149522 total
 174922 total
 181897 total
 1229726 total
 2305504 total
 1196390 total
 5509702 total
  9886665 total

Isolate these partial sums by printing only the first whitespace-delimited field of each totalline:

通过仅打印每total行的第一个以空格分隔的字段来隔离这些部分和：

$ find . -type f -exec wc -w {} + | awk '/total/{print }' | paste -sd+
8264577+654892+1109527+149522+174922+181897+1229726+2305504+1196390+5509702+9886665

pastethe partial sums with a +delimiter to give an infix summation:

paste带有+定界符的部分总和给出中缀总和：

$ find . -type f -exec wc -w {} + | awk '/total/{print }' | paste -sd+ | bc
30663324

Evaluate the infix summation using bc, which supports both infix expressions and arbitrary precision:

使用评估中缀bc和，它支持中缀表达式和任意精度：

#!/usr/bin/env bash

shopt -s globstar
count=0
for f in **/*.txt
do
    words=$(wc -w "$f" | awk '{print }')
    count=$(($count + $words))
done
echo $count

References:

参考：

Answer 4

回答by miken32

If there's one thing I've learned from all the bashquestions on SO, it's that a filename with a space will mess you up. This script will work even if you have whitespace in the file names.

如果我从SO 上的所有bash问题中学到了一件事，那就是带有空格的文件名会让您感到困惑。即使文件名中有空格，此脚本也能正常工作。

wc -l *


10  000292_0
500 000297_0
510 total

Answer 5

回答by Yeikel

Assuming you don't need to recursively count the words and that you want to include all the files in the current directory , you can use a simple approach such as:

假设您不需要递归计算单词并且您希望包含当前目录中的所有文件，您可以使用一种简单的方法，例如：

cat *.txt | wc -l

If you want to count the words for only a specific extension in the current directory , you could try :

如果您只想计算当前目录中特定扩展名的字数，您可以尝试：

##代码##

bash 如何递归计算目录中的单词数？

提问by Alistair Colling

回答by karakfa

回答by janos

回答by rubicks

回答by miken32

回答by Yeikel

相关推荐

最近更新

标签

bash 如何递归计算目录中的单词数？

提问by Alistair Colling

回答by karakfa

回答by janos

回答by rubicks

回答by miken32

回答by Yeikel

相关推荐

带有 fdisk 的 Bash 脚本

bash 如何获取变量以具有当前目录路径？

bash Shell脚本从txt文件读取变量？

bash Vagrant 在配置期间没有安装 pip

相关推荐

最近更新

标签