Linux 如何计算包括子目录在内的代码行数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/316590/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to count lines of code including sub-directories
提问by speciousfool
Suppose I want to count the lines of code in a project. If all of the files are in the same directory I can execute:
假设我想计算一个项目中的代码行数。如果所有文件都在同一目录中,我可以执行:
cat * | wc -l
However, if there are sub-directories, this doesn't work. For this to work cat would have to have a recursive mode. I suspect this might be a job for xargs, but I wonder if there is a more elegant solution?
但是,如果有子目录,这不起作用。为此,猫必须具有递归模式。我怀疑这可能是 xargs 的工作,但我想知道是否有更优雅的解决方案?
采纳答案by philant
First you do not need to use cat
to count lines. This is an antipattern called Useless Use of Cat(UUoC). To count lines in files in the current directory, use wc
:
首先,您不需要使用cat
来计算行数。这是一种称为无用使用 Cat(UUoC)的反模式。要计算当前目录中文件的行数,请使用wc
:
wc -l *
Then the find
command recurses the sub-directories:
然后find
命令递归子目录:
find . -name "*.c" -exec wc -l {} \;
.
is the name of the top directory to start searching from-name "*.c"
is the pattern of the file you're interested in-exec
gives a command to be executed{}
is the result of the find command to be passed to the command (herewc-l
)\;
indicates the end of the command
.
是开始搜索的顶级目录的名称-name "*.c"
是您感兴趣的文件的模式-exec
给出要执行的命令{}
是要传递给命令的 find 命令的结果(此处wc-l
)\;
表示命令结束
This command produces a list of all files found with their line count, if you want to have the sum for allthe files found, you can use find to list the files (with the -print
option) and than use xargs to pass this list as argument to wc-l.
此命令生成与他们的行数找到的所有文件的列表,如果你想拥有之所有找到的文件,你可以使用find列出文件(使用-print
选件),比使用xargs的通过这个列表作为参数到 wc-l。
find . -name "*.c" -print | xargs wc -l
EDIT to address Robert Gamble comment (thanks): if you have spaces or newlines (!) in file names, then you have to use -print0
option instead of -print
and xargs -null
so that the list of file names are exchanged with null-terminated strings.
编辑以解决 Robert Gamble 评论(感谢):如果文件名中有空格或换行符 (!),则必须使用-print0
option 而不是-print
andxargs -null
以便文件名列表与以空字符结尾的字符串交换。
find . -name "*.c" -print0 | xargs -0 wc -l
The Unix philosophy is to have tools that do one thing only, and do it well.
Unix 的哲学是拥有只做一件事的工具,并且把它做好。
回答by chromakode
Try using the find
command, which recurses directories by default:
尝试使用find
命令,默认情况下递归目录:
find . -type f -execdir cat {} \; | wc -l
find . -type f -execdir cat {} \; | wc -l
回答by Ken
I think you're probably stuck with xargs
我想你可能被 xargs 困住了
find -name '*php' | xargs cat | wc -l
chromakode's method gives the same result but is much much slower. If you use xargs your cating and wcing can start as soon as findstarts finding.
chromakode的方法给出了相同的结果,但速度要慢得多。如果您使用 xargs 您的cating 和wcing 可以在find开始查找后立即启动。
Good explanation at Linux: xargs vs. exec {}
回答by Kent Fredric
If you want a code-golfing answer:
如果你想要一个代码打高尔夫球的答案:
grep '' -R . | wc -l
The problem with just using wc -l on its own is it cant descend well, and the oneliners using
单独使用 wc -l 的问题是它不能很好地下降,并且使用 oneliners
find . -exec wc -l {} \;
Won't give you a total line count because it runs wc once for every file, ( loL! ) and
不会给你一个总行数,因为它为每个文件运行一次 wc,(大声笑!)和
find . -exec wc -l {} +
Will get confused as soon as find hits the ~200k1,2character argument limit for parameters and instead calls wc multipletimes, each time only giving you a partial summary.
将发生混乱,只要发现打〜200K 1,2WC为参数字符参数限制,而是调用多个次,每次只给你一个部分摘要。
Additionally, the above grep trick will not add more than 1 line to the output when it encounters a binary file, which could be circumstantially beneficial.
此外,上面的 grep 技巧在遇到二进制文件时不会在输出中添加超过 1 行,这可能是间接有益的。
For the cost of 1 extra command character, you can ignore binary files completely:
对于 1 个额外命令字符的成本,您可以完全忽略二进制文件:
grep '' -IR . | wc -l
If you want to run line counts on binary files too
如果您也想对二进制文件运行行数
grep '' -aR . | wc -l
关于限制的脚注:
The docs are a bit vague as to whether its a stringsize limit or a number of tokenslimit.
文档对于它是字符串大小限制还是令牌数量限制有点含糊。
cd /usr/include;
find -type f -exec perl -e 'printf qq[%s => %s\n], scalar @ARGV, length join q[ ], @ARGV' {} +
# 4066 => 130974
# 3399 => 130955
# 3155 => 130978
# 2762 => 130991
# 3923 => 130959
# 3642 => 130989
# 4145 => 130993
# 4382 => 130989
# 4406 => 130973
# 4190 => 131000
# 4603 => 130988
# 3060 => 95435
This implies its going to chunk very very easily.
这意味着它很容易分块。
回答by Aaron Digulla
The correct way is:
正确的方法是:
find . -name "*.c" -print0 | xargs -0 cat | wc -l
You must use -print0 because there are only two invalid characters in Unix filenames: The null byte and "/" (slash). So for example "xxx\npasswd" is a valid name. In reality, you're more likely to encounter names with spaces in them, though. The commands above would count each word as a separate file.
您必须使用 -print0,因为 Unix 文件名中只有两个无效字符:空字节和“/”(斜线)。因此,例如“xxx\npasswd”是一个有效名称。但实际上,您更有可能遇到带有空格的名称。上面的命令会将每个单词算作一个单独的文件。
You might also want to use "-type f" instead of -name to limit the search to files.
您可能还想使用“-type f”而不是 -name 来限制对文件的搜索。
回答by Idelic
Using cat or grep in the solutions above is wasteful if you can use relatively recent GNU tools, including Bash:
如果您可以使用相对较新的 GNU 工具(包括 Bash),那么在上述解决方案中使用 cat 或 grep 是一种浪费:
wc -l --files0-from=<(find . -name \*.c -print0)
This handles file names with spaces, arbitrary recursion and any number of matching files, even if they exceed the command line length limit.
这可以处理带有空格、任意递归和任意数量的匹配文件的文件名,即使它们超出了命令行长度限制。
回答by abalmos
If you want to generate only a total line count and not a line count for each file something like:
如果您只想生成总行数而不是每个文件的行数,例如:
find . -type f -exec wc -l {} \; | awk '{total += } END{print total}'
works well. This saves you the need to do further text filtering in a script.
效果很好。这使您无需在脚本中进行进一步的文本过滤。
回答by Dave Pitts
I like to use findand headtogether for "a recursively cat" on all the files in a project directory, for example:
我喜欢在项目目录中的所有文件上使用find和head一起“递归猫”,例如:
find . -name "*rb" -print0 | xargs -0 head -10000
The advantage is that head will add your the filename and path:
优点是 head 会添加你的文件名和路径:
==> ./recipes/default.rb <==
DOWNLOAD_DIR = '/tmp/downloads'
MYSQL_DOWNLOAD_URL = 'http://cdn.mysql.com/Downloads/MySQL-5.6/mysql-5.6.10-debian6.0-x86_64.deb'
MYSQL_DOWNLOAD_FILE = "#{DOWNLOAD_DIR}/mysql-5.6.10-debian6.0-x86_64.deb"
package "mysql-server-5.5"
...
==> ./templates/default/my.cnf.erb <==
#
# The MySQL database server configuration file.
#
...
==> ./templates/default/mysql56.sh.erb <==
PATH=/opt/mysql/server-5.6/bin:$PATH
For the complete example here, please see my blog post :
有关此处的完整示例,请参阅我的博客文章:
http://haildata.net/2013/04/using-cat-recursively-with-nicely-formatted-output-including-headers/
http://haildata.net/2013/04/using-cat-recursively-with-nicely-formatted-output-include-headers/
Note I used 'head -10000', clearly if I have files over 10,000 lines this is going to truncate the output ... however I could use head 100000 but for "informal project/directory browsing" this approach works very well for me.
注意我使用了“head -10000”,显然如果我的文件超过 10,000 行,这将截断输出......但是我可以使用 head 100000 但对于“非正式项目/目录浏览”,这种方法对我来说非常有效。
回答by curran
Here's a Bash script that counts the lines of code in a project. It traverses a source tree recursively, and it excludes blank lines and single line comments that use "//".
这是一个 Bash 脚本,用于计算项目中的代码行数。它递归地遍历源树,并排除使用“//”的空行和单行注释。
# $excluded is a regex for paths to exclude from line counting
excluded="spec\|node_modules\|README\|lib\|docs\|csv\|XLS\|json\|png"
countLines(){
# $total is the total lines of code counted
total=0
# -mindepth exclues the current directory (".")
for file in `find . -mindepth 1 -name "*.*" |grep -v "$excluded"`; do
# First sed: only count lines of code that are not commented with //
# Second sed: don't count blank lines
# $numLines is the lines of code
numLines=`cat $file | sed '/\/\//d' | sed '/^\s*$/d' | wc -l`
total=$(($total + $numLines))
echo " " $numLines $file
done
echo " " $total in total
}
echo Source code files:
countLines
echo Unit tests:
cd spec
countLines
Here's what the output looks like for my project:
这是我的项目的输出:
Source code files:
2 ./buildDocs.sh
24 ./countLines.sh
15 ./css/dashboard.css
53 ./data/un_population/provenance/preprocess.js
19 ./index.html
5 ./server/server.js
2 ./server/startServer.sh
24 ./SpecRunner.html
34 ./src/computeLayout.js
60 ./src/configDiff.js
18 ./src/dashboardMirror.js
37 ./src/dashboardScaffold.js
14 ./src/data.js
68 ./src/dummyVis.js
27 ./src/layout.js
28 ./src/links.js
5 ./src/main.js
52 ./src/processActions.js
86 ./src/timeline.js
73 ./src/udc.js
18 ./src/wire.js
664 in total
Unit tests:
230 ./ComputeLayoutSpec.js
134 ./ConfigDiffSpec.js
134 ./ProcessActionsSpec.js
84 ./UDCSpec.js
149 ./WireSpec.js
731 in total
Enjoy! --Curran
享受!--柯兰
回答by SD.
find . -name "*.h" -print | xargs wc -l