Linux 如何计算包括子目录在内的代码行数

Question

提问by speciousfool

Suppose I want to count the lines of code in a project. If all of the files are in the same directory I can execute:

假设我想计算一个项目中的代码行数。如果所有文件都在同一目录中，我可以执行：

cat * | wc -l

However, if there are sub-directories, this doesn't work. For this to work cat would have to have a recursive mode. I suspect this might be a job for xargs, but I wonder if there is a more elegant solution?

但是，如果有子目录，这不起作用。为此，猫必须具有递归模式。我怀疑这可能是 xargs 的工作，但我想知道是否有更优雅的解决方案？

Answer 1

采纳答案by philant

First you do not need to use catto count lines. This is an antipattern called Useless Use of Cat(UUoC). To count lines in files in the current directory, use wc:

首先，您不需要使用cat来计算行数。这是一种称为无用使用 Cat(UUoC)的反模式。要计算当前目录中文件的行数，请使用wc：

wc -l *

Then the findcommand recurses the sub-directories:

然后find命令递归子目录：

find . -name "*.c" -exec wc -l {} \;

.is the name of the top directory to start searching from
-name "*.c"is the pattern of the file you're interested in
-execgives a command to be executed
{}is the result of the find command to be passed to the command (here wc-l)
\;indicates the end of the command

.是开始搜索的顶级目录的名称
-name "*.c"是您感兴趣的文件的模式
-exec给出要执行的命令
{}是要传递给命令的 find 命令的结果（此处wc-l）
\;表示命令结束

This command produces a list of all files found with their line count, if you want to have the sum for allthe files found, you can use find to list the files (with the -printoption) and than use xargs to pass this list as argument to wc-l.

此命令生成与他们的行数找到的所有文件的列表，如果你想拥有之所有找到的文件，你可以使用find列出文件（使用-print选件），比使用xargs的通过这个列表作为参数到 wc-l。

find . -name "*.c" -print | xargs wc -l

EDIT to address Robert Gamble comment (thanks): if you have spaces or newlines (!) in file names, then you have to use -print0option instead of -printand xargs -nullso that the list of file names are exchanged with null-terminated strings.

编辑以解决 Robert Gamble 评论（感谢）：如果文件名中有空格或换行符 (!)，则必须使用-print0option 而不是-printandxargs -null以便文件名列表与以空字符结尾的字符串交换。

find . -name "*.c" -print0 | xargs -0 wc -l

The Unix philosophy is to have tools that do one thing only, and do it well.

Unix 的哲学是拥有只做一件事的工具，并且把它做好。

Answer 2

回答by chromakode

Try using the findcommand, which recurses directories by default:

尝试使用find命令，默认情况下递归目录：

find . -type f -execdir cat {} \; | wc -l

Answer 3

回答by Ken

I think you're probably stuck with xargs

我想你可能被 xargs 困住了

find -name '*php' | xargs cat | wc -l

chromakode's method gives the same result but is much much slower. If you use xargs your cating and wcing can start as soon as findstarts finding.

chromakode的方法给出了相同的结果，但速度要慢得多。如果您使用 xargs 您的cating 和wcing 可以在find开始查找后立即启动。

Good explanation at Linux: xargs vs. exec {}

Linux 上的好解释：xargs vs. exec {}

Answer 4

回答by Kent Fredric

If you want a code-golfing answer:

如果你想要一个代码打高尔夫球的答案：

grep '' -R . | wc -l

The problem with just using wc -l on its own is it cant descend well, and the oneliners using

单独使用 wc -l 的问题是它不能很好地下降，并且使用 oneliners

find . -exec wc -l {} \;

Won't give you a total line count because it runs wc once for every file, ( loL! ) and

不会给你一个总行数，因为它为每个文件运行一次 wc，（大声笑！）和

find . -exec wc -l {} +

Will get confused as soon as find hits the ~200k¹^,²character argument limit for parameters and instead calls wc multipletimes, each time only giving you a partial summary.

将发生混乱，只要发现打〜200K ¹^，²WC为参数字符参数限制，而是调用多个次，每次只给你一个部分摘要。

Additionally, the above grep trick will not add more than 1 line to the output when it encounters a binary file, which could be circumstantially beneficial.

此外，上面的 grep 技巧在遇到二进制文件时不会在输出中添加超过 1 行，这可能是间接有益的。

For the cost of 1 extra command character, you can ignore binary files completely:

对于 1 个额外命令字符的成本，您可以完全忽略二进制文件：

 grep '' -IR . | wc -l

If you want to run line counts on binary files too

如果您也想对二进制文件运行行数

 grep '' -aR . | wc -l

关于限制的脚注：

The docs are a bit vague as to whether its a stringsize limit or a number of tokenslimit.

文档对于它是字符串大小限制还是令牌数量限制有点含糊。

cd /usr/include;
find -type f -exec perl -e 'printf qq[%s => %s\n], scalar @ARGV, length join q[ ], @ARGV' {} + 
# 4066 => 130974
# 3399 => 130955
# 3155 => 130978
# 2762 => 130991
# 3923 => 130959
# 3642 => 130989
# 4145 => 130993
# 4382 => 130989
# 4406 => 130973
# 4190 => 131000
# 4603 => 130988
# 3060 => 95435

This implies its going to chunk very very easily.

这意味着它很容易分块。

Answer 5

回答by Aaron Digulla

The correct way is:

正确的方法是：

find . -name "*.c" -print0 | xargs -0 cat | wc -l

You must use -print0 because there are only two invalid characters in Unix filenames: The null byte and "/" (slash). So for example "xxx\npasswd" is a valid name. In reality, you're more likely to encounter names with spaces in them, though. The commands above would count each word as a separate file.

您必须使用 -print0，因为 Unix 文件名中只有两个无效字符：空字节和“/”（斜线）。因此，例如“xxx\npasswd”是一个有效名称。但实际上，您更有可能遇到带有空格的名称。上面的命令会将每个单词算作一个单独的文件。

You might also want to use "-type f" instead of -name to limit the search to files.

您可能还想使用“-type f”而不是 -name 来限制对文件的搜索。

Answer 6

回答by Idelic

Using cat or grep in the solutions above is wasteful if you can use relatively recent GNU tools, including Bash:

如果您可以使用相对较新的 GNU 工具（包括 Bash），那么在上述解决方案中使用 cat 或 grep 是一种浪费：

wc -l --files0-from=<(find . -name \*.c -print0)

This handles file names with spaces, arbitrary recursion and any number of matching files, even if they exceed the command line length limit.

这可以处理带有空格、任意递归和任意数量的匹配文件的文件名，即使它们超出了命令行长度限制。

Answer 7

回答by abalmos

If you want to generate only a total line count and not a line count for each file something like:

如果您只想生成总行数而不是每个文件的行数，例如：

find . -type f -exec wc -l {} \; | awk '{total += } END{print total}'

works well. This saves you the need to do further text filtering in a script.

效果很好。这使您无需在脚本中进行进一步的文本过滤。

Answer 8

回答by Dave Pitts

I like to use findand headtogether for "a recursively cat" on all the files in a project directory, for example:

我喜欢在项目目录中的所有文件上使用find和head一起“递归猫”，例如：

find . -name "*rb" -print0 | xargs -0 head -10000

The advantage is that head will add your the filename and path:

优点是 head 会添加你的文件名和路径：

==> ./recipes/default.rb <==
DOWNLOAD_DIR = '/tmp/downloads'
MYSQL_DOWNLOAD_URL = 'http://cdn.mysql.com/Downloads/MySQL-5.6/mysql-5.6.10-debian6.0-x86_64.deb'
MYSQL_DOWNLOAD_FILE = "#{DOWNLOAD_DIR}/mysql-5.6.10-debian6.0-x86_64.deb"

package "mysql-server-5.5"
...

==> ./templates/default/my.cnf.erb <==
#
# The MySQL database server configuration file.
#
...

==> ./templates/default/mysql56.sh.erb <==
PATH=/opt/mysql/server-5.6/bin:$PATH

For the complete example here, please see my blog post :

有关此处的完整示例，请参阅我的博客文章：

http://haildata.net/2013/04/using-cat-recursively-with-nicely-formatted-output-including-headers/

http://haildata.net/2013/04/using-cat-recursively-with-nicely-formatted-output-include-headers/

Note I used 'head -10000', clearly if I have files over 10,000 lines this is going to truncate the output ... however I could use head 100000 but for "informal project/directory browsing" this approach works very well for me.

注意我使用了“head -10000”，显然如果我的文件超过 10,000 行，这将截断输出......但是我可以使用 head 100000 但对于“非正式项目/目录浏览”，这种方法对我来说非常有效。

Answer 9

回答by curran

Here's a Bash script that counts the lines of code in a project. It traverses a source tree recursively, and it excludes blank lines and single line comments that use "//".

这是一个 Bash 脚本，用于计算项目中的代码行数。它递归地遍历源树，并排除使用“//”的空行和单行注释。

# $excluded is a regex for paths to exclude from line counting
excluded="spec\|node_modules\|README\|lib\|docs\|csv\|XLS\|json\|png"

countLines(){
  # $total is the total lines of code counted
  total=0
  # -mindepth exclues the current directory (".")
  for file in `find . -mindepth 1 -name "*.*" |grep -v "$excluded"`; do
    # First sed: only count lines of code that are not commented with //
    # Second sed: don't count blank lines
    # $numLines is the lines of code
    numLines=`cat $file | sed '/\/\//d' | sed '/^\s*$/d' | wc -l`
    total=$(($total + $numLines))
    echo "  " $numLines $file
  done
  echo "  " $total in total
}

echo Source code files:
countLines
echo Unit tests:
cd spec
countLines

Here's what the output looks like for my project:

这是我的项目的输出：

Source code files:
   2 ./buildDocs.sh
   24 ./countLines.sh
   15 ./css/dashboard.css
   53 ./data/un_population/provenance/preprocess.js
   19 ./index.html
   5 ./server/server.js
   2 ./server/startServer.sh
   24 ./SpecRunner.html
   34 ./src/computeLayout.js
   60 ./src/configDiff.js
   18 ./src/dashboardMirror.js
   37 ./src/dashboardScaffold.js
   14 ./src/data.js
   68 ./src/dummyVis.js
   27 ./src/layout.js
   28 ./src/links.js
   5 ./src/main.js
   52 ./src/processActions.js
   86 ./src/timeline.js
   73 ./src/udc.js
   18 ./src/wire.js
   664 in total
Unit tests:
   230 ./ComputeLayoutSpec.js
   134 ./ConfigDiffSpec.js
   134 ./ProcessActionsSpec.js
   84 ./UDCSpec.js
   149 ./WireSpec.js
   731 in total

Enjoy! --Curran

享受！--柯兰

Answer 10

回答by SD.

find . -name "*.h" -print | xargs wc -l

Linux 如何计算包括子目录在内的代码行数

提问by speciousfool

采纳答案by philant

回答by chromakode

回答by Ken

回答by Kent Fredric

回答by Aaron Digulla

回答by Idelic

回答by abalmos

回答by Dave Pitts

回答by curran

回答by SD.

相关推荐

最近更新

标签

Linux 如何计算包括子目录在内的代码行数

提问by speciousfool

采纳答案by philant

回答by chromakode

回答by Ken

回答by Kent Fredric

回答by Aaron Digulla

回答by Idelic

回答by abalmos

回答by Dave Pitts

回答by curran

回答by SD.

相关推荐

Linux 如何使用文件描述符刷新写入？

C# 使用 Web 窗体的 ASP.NET 路由

Linux 等效于 Mac OS X“打开”命令

如何在 .NET 中使用 C# 编辑 div 的 CSS 样式

相关推荐

最近更新

标签