bash 如何快速找到一个目录下的所有git repos
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11981716/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to quickly find all git repos under a directory
提问by Mike Slinn
The following bash script is slow when scanning for .git directories because it looks at every directory. If I have a collection of large repositories it takes a long time for find to churn through every directory, looking for .git. It would go much faster if it would prune the directories within repos, once a .git directory is found. Any ideas on how to do that, or is there another way to write a bash script that accomplishes the same thing?
以下 bash 脚本在扫描 .git 目录时很慢,因为它会查看每个目录。如果我有一个大型存储库的集合,则 find 需要很长时间才能遍历每个目录,寻找 .git。一旦找到 .git 目录,它会修剪 repos 中的目录,速度会快得多。关于如何做到这一点的任何想法,或者是否有另一种方法来编写完成同样事情的 bash 脚本?
#!/bin/bash
# Update all git directories below current directory or specified directory
HIGHLIGHT="\e[01;34m"
NORMAL='\e[00m'
DIR=.
if [ "" != "" ]; then DIR=; fi
cd $DIR>/dev/null; echo -e "${HIGHLIGHT}Scanning ${PWD}${NORMAL}"; cd ->/dev/null
for d in `find . -name .git -type d`; do
cd $d/.. > /dev/null
echo -e "\n${HIGHLIGHT}Updating `pwd`$NORMAL"
git pull
cd - > /dev/null
done
Specifically, how would you use these options? For this problem, you cannot assume that the collection of repos is all in the same directory; they might be within nested directories.
具体来说,您将如何使用这些选项?对于这个问题,不能假设repos的集合都在同一个目录下;它们可能位于嵌套目录中。
top
repo1
dirA
dirB
dirC
repo1
采纳答案by Mike Slinn
Here is an optimized solution:
这是一个优化的解决方案:
#!/bin/bash
# Update all git directories below current directory or specified directory
# Skips directories that contain a file called .ignore
HIGHLIGHT="\e[01;34m"
NORMAL='\e[00m'
function update {
local d=""
if [ -d "$d" ]; then
if [ -e "$d/.ignore" ]; then
echo -e "\n${HIGHLIGHT}Ignoring $d${NORMAL}"
else
cd $d > /dev/null
if [ -d ".git" ]; then
echo -e "\n${HIGHLIGHT}Updating `pwd`$NORMAL"
git pull
else
scan *
fi
cd .. > /dev/null
fi
fi
#echo "Exiting update: pwd=`pwd`"
}
function scan {
#echo "`pwd`"
for x in $*; do
update "$x"
done
}
if [ "" != "" ]; then cd > /dev/null; fi
echo -e "${HIGHLIGHT}Scanning ${PWD}${NORMAL}"
scan *
回答by Clayton Stanley
Check out Dennis' answer in this post about find's -prune option:
在这篇关于 find 的 -prune 选项的帖子中查看丹尼斯的回答:
How to use '-prune' option of 'find' in sh?
find . -name .git -type d -prune
Will speed things up a bit, as find won't descend into .git directories, but it still does descend into git repositories, looking for other .git folders. And that 'could' be a costly operation.
会加快速度,因为 find 不会进入 .git 目录,但它仍然会进入 git 存储库,寻找其他 .git 文件夹。这“可能”是一项代价高昂的操作。
What would be cool is if there was some sort of find lookahead pruning mechanism, where if a folder has a subfolder called .git, then prune on that folder...
如果有某种查找前瞻修剪机制,如果一个文件夹有一个名为 .git 的子文件夹,那么在该文件夹上修剪......
That said, I'm betting your bottleneck is in the network operation 'git pull', and not in the find command, as others have posted in the comments.
也就是说,我打赌你的瓶颈在于网络操作“git pull”,而不是在 find 命令中,正如其他人在评论中发布的那样。
回答by vaab
I've taken the time to copy-paste the script in your question, compare it to the script with your own answer. Here some interesting results:
我花时间将脚本复制粘贴到您的问题中,并将其与您自己的答案的脚本进行比较。这里有一些有趣的结果:
Please note that:
请注意:
- I've disabled the
git pull
by prefixing them with aecho
- I've removed also the color things
- I've removed also the
.ignore
file testing in thebash
solution. - And removed the unecessary
> /dev/null
here and there. - removed
pwd
calls in both. - added
-prune
which is obviously lacking in thefind
example - used "while" instead of "for" which was also counter productive in the
find
example - considerably untangled the second example to get to the point.
- added a test on the
bash
solution to NOT follow sym link to avoid cycles and behave as the find solution. - added
shopt
to allow*
to expand to dotted directory names also to matchfind
solution's functionality.
- 我
git pull
通过在它们前面加上前缀来禁用echo
- 我也删除了颜色的东西
- 我还删除
.ignore
了bash
解决方案中的文件测试。 - 并删除了
> /dev/null
这里和那里不必要的东西。 - 删除
pwd
了两者的调用。 - 添加
-prune
了find
示例中显然缺少的内容 - 使用“while”而不是“for”,这在
find
示例中也适得其反 - 相当解开第二个例子以达到正题。
- 添加了对
bash
解决方案的测试,不遵循符号链接以避免循环并表现为查找解决方案。 - 添加
shopt
以允许*
扩展到点目录名称也匹配find
解决方案的功能。
Thus, we are comparing, the find based solution:
因此,我们正在比较基于查找的解决方案:
#!/bin/bash
find . -name .git -type d -prune | while read d; do
cd $d/..
echo "$PWD >" git pull
cd $OLDPWD
done
With the bash shell builting solution:
使用bash shell 构建解决方案:
#!/bin/bash
shopt -s dotglob
update() {
for d in "$@"; do
test -d "$d" -a \! -L "$d" || continue
cd "$d"
if [ -d ".git" ]; then
echo "$PWD >" git pull
else
update *
fi
cd ..
done
}
update *
Note: builtins (function
and the for
) are immune to MAX_ARGS OS limit for launching processes. So the *
won't break even on very large directories.
注意:内置函数(function
和for
)不受 MAX_ARGS 操作系统启动进程的限制。所以*
不会在非常大的目录上收支平衡。
Technical differences between solutions:
解决方案之间的技术差异:
The find based solution uses C function to crawl repository, it:
基于 find 的解决方案使用 C 函数来抓取存储库,它:
- has to load a new process for the
find
command. - will avoid ".git" content but will crawl workdir of git repositories, and loose some times in those (and eventually find more matching elements).
- will have to
chdir
through several depth of sub-dir for each match and go back. - will have to
chdir
once in the find command and once in the bash part.
- 必须为该
find
命令加载一个新进程。 - 将避免“.git”内容,但会抓取 git 存储库的工作目录,并在其中松散一些时间(并最终找到更多匹配的元素)。
chdir
每场比赛都必须通过几个深度的子目录并返回。- 必须
chdir
在 find 命令中一次,在 bash 部分中一次。
The bash based solution uses builtin (so near-C implementation, but interpreted) to crawl repository, note that it:
基于 bash 的解决方案使用内置(因此接近 C 实现,但已解释)来抓取存储库,请注意:
- will use only one process.
- will avoid git workdir subdirectory.
- will only perform
chdir
one level at a time. - will only perform
chdir
once for looking and performing the command.
- 将只使用一个进程。
- 将避免 git workdir 子目录。
- 一次只会执行
chdir
一个级别。 - 将只执行
chdir
一次以查找和执行命令。
Actual speed results between solutions:
解决方案之间的实际速度结果:
I have a working development collection of git repository on which I launched the scripts:
我有一个 git 存储库的工作开发集合,我在其中启动了脚本:
- find solution: ~0.080s (bash chdir takes ~0.010s)
- bash solution: ~0.017s
- 找到解决方案:~0.080s(bash chdir 需要~0.010s)
- bash 解决方案:~0.017s
I have to admit that I wasn't prepared to see such a win from bash builtins. It became
more apparent and normal after doing the analysis of what's going on. To add insult to injuries, if you change the shell from /bin/bash
to /bin/sh
(you must comment out the shopt
line, and be prepared that it won't parse dotted directories), you'll fall to
~0.008s . Beat that !
我不得不承认,我不准备从 bash 内置程序中看到这样的胜利。在分析了发生的事情后,它变得更加明显和正常。雪上加霜的是,如果您将外壳从/bin/bash
to更改为/bin/sh
(您必须注释掉该shopt
行,并准备好它不会解析点目录),您将跌至 ~0.008s 。打败那个 !
Note that you can be more clever with the find solution by using:
请注意,您可以更聪明地使用 find 解决方案:
find . -type d \( -exec /usr/bin/test -d "{}/.git" -a "{}" != "." \; -print -prune \
-o -name .git -prune \)
which will effectively remove crawling all sub-repository in a found git repository, at the price of spawning a process for each directory crawled. The final find solution I came with was around ~0.030s, which is more than twice faster than the previous find version, but remains 2 times slower than the bash solution.
这将有效地删除在找到的 git 存储库中抓取所有子存储库,代价是为每个被抓取的目录生成一个进程。我带来的最终 find 解决方案大约是 0.030 秒,比之前的 find 版本快两倍多,但仍然比 bash 解决方案慢 2 倍。
Note that /usr/bin/test
is important to avoid search in $PATH
which costs time, and I needed -o -name .git -prune
and -a "{}" != "."
because my main repository was itself a git subrepository.
请注意,/usr/bin/test
避免搜索$PATH
花费时间很重要,而我需要-o -name .git -prune
并且-a "{}" != "."
因为我的主存储库本身就是一个 git 子存储库。
As a conclusion, I won't be using the bash builtin solution because it has too much corner cases for me (and my first test hit one of the limitation). But it was important for me to explain why it could be (much) faster in some cases, but find
solution seems much more robust and consistent to me.
作为结论,我不会使用 bash 内置解决方案,因为它对我来说有太多的极端情况(而且我的第一次测试遇到了一个限制)。但对我来说解释为什么在某些情况下它可以(更快)更快很重要,但find
解决方案对我来说似乎更加强大和一致。
回答by CharlieB
The answers above all rely on finding a ".git" repository. However not all git repos have these (e.g. bare repos). The following command will loop through all directories and ask git if it considers each to be a directory. If so, it prunes sub dirs off the tree and continues.
答案首先取决于找到“.git”存储库。然而,并不是所有的 git repos 都有这些(例如,bare repos)。以下命令将遍历所有目录并询问 git 是否将每个目录视为一个目录。如果是这样,它会从树上修剪子目录并继续。
find . -type d -exec sh -c 'cd "{}"; git rev-parse --git-dir 2> /dev/null 1>&2' \; -prune -print
It's a lot slower than other solutions because it's executing a command in each directory, but it doesn't rely on a particular repository structure. Could be useful for finding bare git repositories for example.
它比其他解决方案慢很多,因为它在每个目录中执行一个命令,但它不依赖于特定的存储库结构。例如,对于查找裸 git 存储库可能很有用。
回答by user5696355
For windows, you can put the following into a batch file called gitlist.bat and put it on your PATH.
对于 Windows,您可以将以下内容放入名为 gitlist.bat 的批处理文件中,并将其放在您的 PATH 中。
@echo off
if {%1}=={} goto :usage
for /r %1 /d %%I in (.) do echo %%I | find ".git\."
goto :eof
:usage
echo usage: gitlist ^<path^>
回答by Clayton Stanley
Check out the answer using the locate command: Is there any way to list up git repositories in terminal?
使用 locate 命令查看答案: Is there any way to list up git repositories in terminal?
The advantages of using locate instead of a custom script are:
使用 locate 代替自定义脚本的优点是:
- The search is indexed, so it scales
- It does not require the use (and maintenance) of a custom bash script
- 搜索已编入索引,因此可以扩展
- 它不需要使用(和维护)自定义 bash 脚本
The disadvantages of using locate are:
使用 locate 的缺点是:
- The db that locate uses is updated weekly, so freshly-created git repositories won't show up
- locate 使用的数据库每周更新一次,因此不会显示新创建的 git 存储库
Going the locate route, here's how to list all git repositories under a directory, for OS X:
转到定位路线,这里是如何列出目录下的所有 git 存储库,对于 OS X:
Enable locate indexing (will be different on Linux):
启用定位索引(在 Linux 上会有所不同):
sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.locate.plist
Run this command after indexing completes (might need some tweaking for Linux):
索引完成后运行此命令(可能需要对 Linux 进行一些调整):
repoBasePath=$HOME
locate '.git' | egrep '.git$' | egrep "^$repoBasePath" | xargs -I {} dirname "{}"
回答by Greg Barrett
I list all git repositories anywhere in the current directory using:
我使用以下命令列出当前目录中任何位置的所有 git 存储库:
find . -type d -execdir test -d {}/.git \; -prune -print
This is fast since it stops recursing once it finds a git repository. (Although it does not handle bare repositories.) Of course, you can change the .
to whatever directory you want. If you need, you can change the -print
to -print0
for null-separated values.
这很快,因为一旦找到 git 存储库,它就会停止递归。(虽然它不处理裸存储库。)当然,您可以将 更改为.
您想要的任何目录。如果需要,您可以将-print
to更改-print0
为空分隔值。
To also ignoredirectories containing a .ignore
file:
还要忽略包含.ignore
文件的目录:
find . -type d \( -execdir test -e {}/.ignore \; -prune \) -o \( -execdir test -d {}/.git \; -prune -print \)
I've added this alias to my ~/.gitconfig
file:
我已将此别名添加到我的~/.gitconfig
文件中:
[alias]
repos = !"find -type d -execdir test -d {}/.git \; -prune -print"
Then I just need to execute:
然后我只需要执行:
git repos
To get a complete listing of all the git repositories anywhere in my current directory.
获取当前目录中任何位置的所有 git 存储库的完整列表。
回答by Mike Slinn
This answer combines the partial answer provided @Greg Barrett with my optimized answer above.
这个答案结合了@Greg Barrett 提供的部分答案和我上面的优化答案。
#!/bin/bash
# Update all git directories below current directory or specified directory
# Skips directories that contain a file called .ignore
HIGHLIGHT="\e[01;34m"
NORMAL='\e[00m'
export PATH=${PATH/':./:'/:}
export PATH=${PATH/':./bin:'/:}
#echo "$PATH"
DIRS="$( find "$@" -type d \( -execdir test -e {}/.ignore \; -prune \) -o \( -execdir test -d {}/.git \; -prune -print \) )"
echo -e "${HIGHLIGHT}Scanning ${PWD}${NORMAL}"
for d in $DIRS; do
cd "$d" > /dev/null
echo -e "\n${HIGHLIGHT}Updating `pwd`$NORMAL"
git pull
cd - > /dev/null
done