如何计算 Git 存储库中特定作者更改的总行数?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1265040/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to count total lines changed by a specific author in a Git repository?
提问by Gav
Is there a command I can invoke which will count the lines changed by a specific author in a Git repository? I know that there must be ways to count the number of commits as Github does this for their Impact graph.
是否有我可以调用的命令来计算 Git 存储库中特定作者更改的行数?我知道必须有办法计算提交的数量,因为 Github 为他们的影响图做了这个。
采纳答案by CB Bailey
The output of the following command should be reasonably easy to send to script to add up the totals:
以下命令的输出应该很容易发送到脚本以将总数相加:
git log --author="<authorname>" --oneline --shortstat
This gives stats for all commits on the current HEAD. If you want to add up stats in other branches you will have to supply them as arguments to git log
.
这提供了当前 HEAD 上所有提交的统计信息。如果要在其他分支中添加统计信息,则必须将它们作为参数提供给git log
.
For passing to a script, removing even the "oneline" format can be done with an empty log format, and as commented by Jakub Nar?bski, --numstat
is another alternative. It generates per-file rather than per-line statistics but is even easier to parse.
对于传递给脚本,甚至可以使用空日志格式删除“oneline”格式,正如 Jakub Nar?bski 所评论的那样,这--numstat
是另一种选择。它生成每个文件而不是每个行的统计信息,但更容易解析。
git log --author="<authorname>" --pretty=tformat: --numstat
回答by Alex
This gives some statistics about the author, modify as required.
这给出了一些关于作者的统计数据,根据需要修改。
Using Gawk:
使用 Gawk:
git log --author="_Your_Name_Here_" --pretty=tformat: --numstat \
| gawk '{ add += ; subs += ; loc += - } END { printf "added lines: %s removed lines: %s total lines: %s\n", add, subs, loc }' -
Using Awk on Mac OSX:
在 Mac OSX 上使用 awk:
git log --author="_Your_Name_Here_" --pretty=tformat: --numstat | awk '{ add += ; subs += ; loc += - } END { printf "added lines: %s, removed lines: %s, total lines: %s\n", add, subs, loc }' -
EDIT (2017)
编辑 (2017)
There is a new package on github that looks slick and uses bash as dependencies (tested on linux). It's more suitable for direct usage rather than scripts.
github 上有一个新包,它看起来很漂亮,并使用 bash 作为依赖项(在 linux 上测试)。它更适合直接使用而不是脚本。
It's git-quick-stats (github link).
它是git-quick-stats (github 链接)。
Copy git-quick-stats
to a folder and add the folder to path.
复制git-quick-stats
到文件夹并将文件夹添加到路径。
mkdir ~/source
cd ~/source
git clone [email protected]:arzzen/git-quick-stats.git
mkdir ~/bin
ln -s ~/source/git-quick-stats/git-quick-stats ~/bin/git-quick-stats
chmod +x ~/bin/git-quick-stats
export PATH=${PATH}:~/bin
Usage:
用法:
git-quick-stats
回答by Dan
In case anyone wants to see the stats for everyuser in their codebase, a couple of my coworkers recently came up with this horrific one-liner:
如果有人想查看他们代码库中每个用户的统计数据,我的几个同事最近想出了这个可怕的单行:
git log --shortstat --pretty="%cE" | sed 's/\(.*\)@.*//' | grep -v "^$" | awk 'BEGIN { line=""; } !/^ / { if (line=="" || !match(line, git ls-files -z | xargs -0n1 git blame -w | ruby -n -e '$_ =~ /^.*\((.*?)\s[\d]{4}/; puts .strip' | sort -f | uniq -c | sort -n
)) {line = $ git shortlog v1.6.4 --numbered --summary
6904 Junio C Hamano
1320 Shawn O. Pearce
1065 Linus Torvalds
692 Johannes Schindelin
443 Eric Wong
"," line }} /^ / { print line " # " git log --format='%aN' | sort -u | while read name; do echo -en "$name\t"; git log --author="$name" --pretty=tformat: --numstat | awk '{ add += ; subs += ; loc += - } END { printf "added lines: %s, removed lines: %s, total lines: %s\n", add, subs, loc }' -; done
; line=""}' | sort | sed -E 's/# //;s/ files? changed,//;s/([0-9]+) ([0-9]+ deletion)/ 0 insertions\(+\), /;s/\(\+\)$/\(\+\), 0 deletions\(-\)/;s/insertions?\(\+\), //;s/ deletions?\(-\)//' | awk 'BEGIN {name=""; files=0; insertions=0; deletions=0;} {if ( != name && name != "") { print name ": " files " files changed, " insertions " insertions(+), " deletions " deletions(-), " insertions-deletions " net"; files=0; insertions=0; deletions=0; name=; } name=; files+=; insertions+=; deletions+=} END {print name ": " files " files changed, " insertions " insertions(+), " deletions " deletions(-), " insertions-deletions " net";}'
(Takes a few minutes to crunch through our repo, which has around 10-15k commits.)
(需要几分钟来处理我们的 repo,其中有大约 10-15k 次提交。)
回答by mmrobins
I found the following to be useful to see who had the most lines that were currently in the code base:
我发现以下内容对于查看谁拥有当前代码库中最多的行很有用:
Jared Burrows added lines: 6826, removed lines: 2825, total lines: 4001
The other answers have mostly focused on lines changed in commits, but if commits don't survive and are overwritten, they may just have been churn. The above incantation also gets you all committers sorted by lines instead of just one at a time. You can add some options to git blame (-C -M) to get some better numbers that take file movement and line movement between files into account, but the command might run a lot longer if you do.
其他答案主要集中在提交中更改的行上,但如果提交不存在并被覆盖,则它们可能只是被搅动了。上面的咒语还可以让您按行排序所有提交者,而不是一次一个。您可以向 git blame (-C -M) 添加一些选项以获得更好的数字,这些数字将文件移动和文件之间的行移动考虑在内,但如果您这样做,该命令可能会运行更长的时间。
Also, if you're looking for lines changed in all commits for all committers, the follow little script is helpful:
此外,如果您正在寻找所有提交者的所有提交中更改的行,以下小脚本会有所帮助:
回答by Jakub Nar?bski
To count number of commitsby a given author (or all authors) on a given branch you can use git-shortlog; see especially its --numbered
and --summary
options, e.g. when run on git repository:
要计算给定作者(或所有作者)在给定分支上的提交次数,您可以使用git-shortlog;特别查看它的--numbered
和--summary
选项,例如在 git 存储库上运行时:
git ls-files -z | xargs -0n1 git blame -w --show-email | perl -n -e '/^.*?\((.*?)\s+[\d]{4}/; print ,"\n"' | sort -f | uniq -c | sort -n
回答by Jared Burrows
After looking at Alex'sand Gerty3000's answer, I have tried to shorten the one-liner:
看了Alex和Gerty3000的回答后,我试图缩短单行:
Basically, using git log numstat and notkeeping track of the number of fileschanged.
基本上,使用 git log numstat 而不是跟踪更改的文件数。
Git version 2.1.0 on Mac OSX:
Mac OSX 上的 Git 2.1.0 版:
git log --no-merges --pretty=format:%an --numstat | awk '/./ && !author { author = git ls-files -z | xargs -0n1 git blame -w --line-porcelain | grep -a "^author " | sort -f | uniq -c | sort -n
; next } author { ins[author] += ; del[author] += } /^$/ { author = ""; next } END { for (a in ins) { printf "%10d %10d %10d %s\n", ins[a] - del[a], ins[a], del[a], a } }' | sort -rn
Example:
例子:
git ls-files -z | xargs -0n1 git blame -w | perl -n -e '/^.*\((.*?)\s*[\d]{4}/; print ,"\n"' | sort -f | uniq -c | sort -n
回答by Erik Zivkovic
The Answerfrom AaronMusing the shell one-liner is good, but actually, there is yet another bug, where spaces will corrupt the user names if there are different amounts of white spaces between the user name and the date. The corrupted user names will give multiple rows for user counts and you have to sum them up yourself.
在回答来自AaronM使用shell一行代码是好的,但实际上,还有另一种错误,其中的空间会损坏,如果有不同数量的用户名和日期之间的白色空间的用户名。损坏的用户名将为用户计数提供多行,您必须自己总结它们。
This small change fixed the issue for me:
这个小改动为我解决了这个问题:
##代码##Notice the + after \s which will consume all whitespaces from the name to the date.
请注意 \s 之后的 +,它将消耗从名称到日期的所有空格。
Actually adding this answer as much for my own rememberance as for helping anyone else, since this is at least the second time I google the subject :)
实际上添加这个答案是为了我自己的记忆和帮助其他人,因为这至少是我第二次用谷歌搜索这个主题:)
- Edit 2019-01-23Added
--show-email
togit blame -w
to aggregate on email instead, since some people use differentName
formats on different computers, and sometimes two people with the same name are working in the same git.
- 编辑 2019-01-23添加
--show-email
到git blame -w
聚合电子邮件,因为有些人Name
在不同的计算机上使用不同的格式,有时两个同名的人在同一个 git 中工作。
回答by kccqzy
Here's a short one-liner that produces stats for all authors. It's much faster than Dan's solution above at https://stackoverflow.com/a/20414465/1102119(mine has time complexity O(N) instead of O(NM) where N is the number of commits, and M the number of authors).
这是一个简短的单行代码,可为所有作者生成统计数据。它比https://stackoverflow.com/a/20414465/1102119 上的Dan 解决方案快得多(我的时间复杂度为 O(N) 而不是 O(NM),其中 N 是提交数,M 是作者数)。
##代码##回答by Stéphane Gourichon
@mmrobins @AaronM @ErikZ @JamesMishra provided variants that all have an problem in common: they ask git to produce a mixture of info not intended for script consumption, including line contents from repository on the same line, then match the mess with a regexp.
@mmrobins @AaronM @ErikZ @JamesMishra 提供的变体都有一个共同的问题:他们要求 git 生成不供脚本使用的混合信息,包括同一行上存储库的行内容,然后将混乱与正则表达式匹配.
This is a problem when some lines aren't valid UTF-8 text, and also when some lines happen to match the regexp (this happened here).
当某些行不是有效的 UTF-8 文本时,以及当某些行碰巧与正则表达式匹配时(这发生在这里),这是一个问题。
Here's a modified line that doesn't have these problems. It requests git to output data cleanly on separate lines, which makes it easy to filter what we want robustly:
这是一条没有这些问题的修改后的线路。它要求 git 在单独的行上干净地输出数据,这使得我们可以轻松地稳健地过滤我们想要的内容:
##代码##You can grep for other strings, like author-mail, committer, etc.
您可以 grep 其他字符串,如作者邮件、提交者等。
Perhaps first do export LC_ALL=C
(assuming bash
) to force byte-level processing (this also happens to speed up grep tremendously from the UTF-8-based locales).
也许首先做export LC_ALL=C
(假设bash
)强制字节级处理(这也恰好从基于 UTF-8 的语言环境中极大地加快了 grep 的速度)。
回答by AaronM
A solution was given with ruby in the middle, perl being a little more available by default here is an alternative using perl for current lines by author.
中间使用 ruby 给出了一个解决方案,默认情况下 perl 更可用一点,这里是作者对当前行使用 perl 的替代方法。
##代码##