bash 如何快速找到一个目录下的所有git repos

Question

提问by Mike Slinn

The following bash script is slow when scanning for .git directories because it looks at every directory. If I have a collection of large repositories it takes a long time for find to churn through every directory, looking for .git. It would go much faster if it would prune the directories within repos, once a .git directory is found. Any ideas on how to do that, or is there another way to write a bash script that accomplishes the same thing?

以下 bash 脚本在扫描 .git 目录时很慢，因为它会查看每个目录。如果我有一个大型存储库的集合，则 find 需要很长时间才能遍历每个目录，寻找 .git。一旦找到 .git 目录，它会修剪 repos 中的目录，速度会快得多。关于如何做到这一点的任何想法，或者是否有另一种方法来编写完成同样事情的 bash 脚本？

#!/bin/bash

# Update all git directories below current directory or specified directory

HIGHLIGHT="\e[01;34m"
NORMAL='\e[00m'

DIR=.
if [ "" != "" ]; then DIR=; fi
cd $DIR>/dev/null; echo -e "${HIGHLIGHT}Scanning ${PWD}${NORMAL}"; cd ->/dev/null

for d in `find . -name .git -type d`; do
  cd $d/.. > /dev/null
  echo -e "\n${HIGHLIGHT}Updating `pwd`$NORMAL"
  git pull
  cd - > /dev/null
done

Specifically, how would you use these options? For this problem, you cannot assume that the collection of repos is all in the same directory; they might be within nested directories.

具体来说，您将如何使用这些选项？对于这个问题，不能假设repos的集合都在同一个目录下；它们可能位于嵌套目录中。

top
  repo1
  dirA

  dirB
     dirC
        repo1

Answer 1

采纳答案by Mike Slinn

Here is an optimized solution:

这是一个优化的解决方案：

#!/bin/bash
# Update all git directories below current directory or specified directory
# Skips directories that contain a file called .ignore

HIGHLIGHT="\e[01;34m"
NORMAL='\e[00m'

function update {
  local d=""
  if [ -d "$d" ]; then
    if [ -e "$d/.ignore" ]; then 
      echo -e "\n${HIGHLIGHT}Ignoring $d${NORMAL}"
    else
      cd $d > /dev/null
      if [ -d ".git" ]; then
        echo -e "\n${HIGHLIGHT}Updating `pwd`$NORMAL"
        git pull
      else
        scan *
      fi
      cd .. > /dev/null
    fi
  fi
  #echo "Exiting update: pwd=`pwd`"
}

function scan {
  #echo "`pwd`"
  for x in $*; do
    update "$x"
  done
}

if [ "" != "" ]; then cd  > /dev/null; fi
echo -e "${HIGHLIGHT}Scanning ${PWD}${NORMAL}"
scan *

Answer 2

回答by Clayton Stanley

Check out Dennis' answer in this post about find's -prune option:

在这篇关于 find 的 -prune 选项的帖子中查看丹尼斯的回答：

How to use '-prune' option of 'find' in sh?

如何在sh中使用'find'的'-prune'选项？

find . -name .git -type d -prune

Will speed things up a bit, as find won't descend into .git directories, but it still does descend into git repositories, looking for other .git folders. And that 'could' be a costly operation.

会加快速度，因为 find 不会进入 .git 目录，但它仍然会进入 git 存储库，寻找其他 .git 文件夹。这“可能”是一项代价高昂的操作。

What would be cool is if there was some sort of find lookahead pruning mechanism, where if a folder has a subfolder called .git, then prune on that folder...

如果有某种查找前瞻修剪机制，如果一个文件夹有一个名为 .git 的子文件夹，那么在该文件夹上修剪......

That said, I'm betting your bottleneck is in the network operation 'git pull', and not in the find command, as others have posted in the comments.

也就是说，我打赌你的瓶颈在于网络操作“git pull”，而不是在 find 命令中，正如其他人在评论中发布的那样。

Answer 3

回答by vaab

I've taken the time to copy-paste the script in your question, compare it to the script with your own answer. Here some interesting results:

我花时间将脚本复制粘贴到您的问题中，并将其与您自己的答案的脚本进行比较。这里有一些有趣的结果：

Please note that:

请注意：

I've disabled the git pullby prefixing them with a echo
I've removed also the color things
I've removed also the .ignorefile testing in the bashsolution.
And removed the unecessary > /dev/nullhere and there.
removed pwdcalls in both.
added -prunewhich is obviously lacking in the findexample
used "while" instead of "for" which was also counter productive in the findexample
considerably untangled the second example to get to the point.
added a test on the bashsolution to NOT follow sym link to avoid cycles and behave as the find solution.
added shoptto allow *to expand to dotted directory names also to match findsolution's functionality.

我git pull通过在它们前面加上前缀来禁用echo
我也删除了颜色的东西
我还删除.ignore了bash解决方案中的文件测试。
并删除了> /dev/null这里和那里不必要的东西。
删除pwd了两者的调用。
添加-prune了find示例中显然缺少的内容
使用“while”而不是“for”，这在find示例中也适得其反
相当解开第二个例子以达到正题。
添加了对bash解决方案的测试，不遵循符号链接以避免循环并表现为查找解决方案。
添加shopt以允许*扩展到点目录名称也匹配find解决方案的功能。

Thus, we are comparing, the find based solution:

因此，我们正在比较基于查找的解决方案：

#!/bin/bash

find . -name .git -type d -prune | while read d; do
   cd $d/..
   echo "$PWD >" git pull
   cd $OLDPWD
done

With the bash shell builting solution:

使用bash shell 构建解决方案：

#!/bin/bash

shopt -s dotglob

update() {
    for d in "$@"; do
        test -d "$d" -a \! -L "$d" || continue
        cd "$d"
        if [ -d ".git" ]; then
            echo "$PWD >" git pull
        else
            update *
        fi
        cd ..
    done
}

update *

Note: builtins (functionand the for) are immune to MAX_ARGS OS limit for launching processes. So the *won't break even on very large directories.

注意：内置函数（function和for）不受 MAX_ARGS 操作系统启动进程的限制。所以*不会在非常大的目录上收支平衡。

Technical differences between solutions:

解决方案之间的技术差异：

The find based solution uses C function to crawl repository, it:

基于 find 的解决方案使用 C 函数来抓取存储库，它：

has to load a new process for the findcommand.
will avoid ".git" content but will crawl workdir of git repositories, and loose some times in those (and eventually find more matching elements).
will have to chdirthrough several depth of sub-dir for each match and go back.
will have to chdironce in the find command and once in the bash part.

必须为该find命令加载一个新进程。
将避免“.git”内容，但会抓取 git 存储库的工作目录，并在其中松散一些时间（并最终找到更多匹配的元素）。
chdir每场比赛都必须通过几个深度的子目录并返回。
必须chdir在 find 命令中一次，在 bash 部分中一次。

The bash based solution uses builtin (so near-C implementation, but interpreted) to crawl repository, note that it:

基于 bash 的解决方案使用内置（因此接近 C 实现，但已解释）来抓取存储库，请注意：

will use only one process.
will avoid git workdir subdirectory.
will only perform chdirone level at a time.
will only perform chdironce for looking and performing the command.

将只使用一个进程。
将避免 git workdir 子目录。
一次只会执行chdir一个级别。
将只执行chdir一次以查找和执行命令。

Actual speed results between solutions:

解决方案之间的实际速度结果：

I have a working development collection of git repository on which I launched the scripts:

我有一个 git 存储库的工作开发集合，我在其中启动了脚本：

find solution: ~0.080s (bash chdir takes ~0.010s)
bash solution: ~0.017s

找到解决方案：~0.080s（bash chdir 需要~0.010s）
bash 解决方案：~0.017s

I have to admit that I wasn't prepared to see such a win from bash builtins. It became more apparent and normal after doing the analysis of what's going on. To add insult to injuries, if you change the shell from /bin/bashto /bin/sh(you must comment out the shoptline, and be prepared that it won't parse dotted directories), you'll fall to ~0.008s . Beat that !

我不得不承认，我不准备从 bash 内置程序中看到这样的胜利。在分析了发生的事情后，它变得更加明显和正常。雪上加霜的是，如果您将外壳从/bin/bashto更改为/bin/sh（您必须注释掉该shopt行，并准备好它不会解析点目录），您将跌至 ~0.008s 。打败那个！

Note that you can be more clever with the find solution by using:

请注意，您可以更聪明地使用 find 解决方案：

find . -type d \( -exec /usr/bin/test -d "{}/.git" -a "{}" != "." \; -print -prune \
       -o -name .git -prune \)

which will effectively remove crawling all sub-repository in a found git repository, at the price of spawning a process for each directory crawled. The final find solution I came with was around ~0.030s, which is more than twice faster than the previous find version, but remains 2 times slower than the bash solution.

这将有效地删除在找到的 git 存储库中抓取所有子存储库，代价是为每个被抓取的目录生成一个进程。我带来的最终 find 解决方案大约是 0.030 秒，比之前的 find 版本快两倍多，但仍然比 bash 解决方案慢 2 倍。

Note that /usr/bin/testis important to avoid search in $PATHwhich costs time, and I needed -o -name .git -pruneand -a "{}" != "."because my main repository was itself a git subrepository.

请注意，/usr/bin/test避免搜索$PATH花费时间很重要，而我需要-o -name .git -prune并且-a "{}" != "."因为我的主存储库本身就是一个 git 子存储库。

As a conclusion, I won't be using the bash builtin solution because it has too much corner cases for me (and my first test hit one of the limitation). But it was important for me to explain why it could be (much) faster in some cases, but findsolution seems much more robust and consistent to me.

作为结论，我不会使用 bash 内置解决方案，因为它对我来说有太多的极端情况（而且我的第一次测试遇到了一个限制）。但对我来说解释为什么在某些情况下它可以（更快）更快很重要，但find解决方案对我来说似乎更加强大和一致。

Answer 4

回答by CharlieB

The answers above all rely on finding a ".git" repository. However not all git repos have these (e.g. bare repos). The following command will loop through all directories and ask git if it considers each to be a directory. If so, it prunes sub dirs off the tree and continues.

答案首先取决于找到“.git”存储库。然而，并不是所有的 git repos 都有这些（例如，bare repos）。以下命令将遍历所有目录并询问 git 是否将每个目录视为一个目录。如果是这样，它会从树上修剪子目录并继续。

find . -type d -exec sh -c 'cd "{}"; git rev-parse --git-dir 2> /dev/null 1>&2' \; -prune -print

It's a lot slower than other solutions because it's executing a command in each directory, but it doesn't rely on a particular repository structure. Could be useful for finding bare git repositories for example.

它比其他解决方案慢很多，因为它在每个目录中执行一个命令，但它不依赖于特定的存储库结构。例如，对于查找裸 git 存储库可能很有用。

Answer 5

回答by user5696355

For windows, you can put the following into a batch file called gitlist.bat and put it on your PATH.

对于 Windows，您可以将以下内容放入名为 gitlist.bat 的批处理文件中，并将其放在您的 PATH 中。

@echo off
if {%1}=={} goto :usage
for /r %1 /d %%I in (.) do echo %%I | find ".git\."
goto :eof
:usage
echo usage: gitlist ^<path^>

Answer 6

回答by Clayton Stanley

Check out the answer using the locate command: Is there any way to list up git repositories in terminal?

使用 locate 命令查看答案： Is there any way to list up git repositories in terminal?

The advantages of using locate instead of a custom script are:

使用 locate 代替自定义脚本的优点是：

The search is indexed, so it scales
It does not require the use (and maintenance) of a custom bash script

搜索已编入索引，因此可以扩展
它不需要使用（和维护）自定义 bash 脚本

The disadvantages of using locate are:

使用 locate 的缺点是：

The db that locate uses is updated weekly, so freshly-created git repositories won't show up

locate 使用的数据库每周更新一次，因此不会显示新创建的 git 存储库

Going the locate route, here's how to list all git repositories under a directory, for OS X:

转到定位路线，这里是如何列出目录下的所有 git 存储库，对于 OS X：

Enable locate indexing (will be different on Linux):

启用定位索引（在 Linux 上会有所不同）：

sudo launchctl load -w /System/Library/LaunchDaemons/com.apple.locate.plist

Run this command after indexing completes (might need some tweaking for Linux):

索引完成后运行此命令（可能需要对 Linux 进行一些调整）：

repoBasePath=$HOME
locate '.git' | egrep '.git$' | egrep "^$repoBasePath" | xargs -I {} dirname "{}"

Answer 7

回答by Greg Barrett

I list all git repositories anywhere in the current directory using:

我使用以下命令列出当前目录中任何位置的所有 git 存储库：

find . -type d -execdir test -d {}/.git \; -prune -print

This is fast since it stops recursing once it finds a git repository. (Although it does not handle bare repositories.) Of course, you can change the .to whatever directory you want. If you need, you can change the -printto -print0for null-separated values.

这很快，因为一旦找到 git 存储库，它就会停止递归。（虽然它不处理裸存储库。）当然，您可以将更改为.您想要的任何目录。如果需要，您可以将-printto更改-print0为空分隔值。

To also ignoredirectories containing a .ignorefile:

还要忽略包含.ignore文件的目录：

find . -type d \( -execdir test -e {}/.ignore \; -prune \) -o \( -execdir test -d {}/.git \; -prune -print \)

I've added this alias to my ~/.gitconfigfile:

我已将此别名添加到我的~/.gitconfig文件中：

[alias]
  repos =  !"find -type d -execdir test -d {}/.git \; -prune -print"

Then I just need to execute:

然后我只需要执行：

git repos

To get a complete listing of all the git repositories anywhere in my current directory.

获取当前目录中任何位置的所有 git 存储库的完整列表。

Answer 8

回答by Mike Slinn

This answer combines the partial answer provided @Greg Barrett with my optimized answer above.

这个答案结合了@Greg Barrett 提供的部分答案和我上面的优化答案。

#!/bin/bash

# Update all git directories below current directory or specified directory
# Skips directories that contain a file called .ignore

HIGHLIGHT="\e[01;34m"
NORMAL='\e[00m'

export PATH=${PATH/':./:'/:}
export PATH=${PATH/':./bin:'/:}
#echo "$PATH"

DIRS="$( find "$@" -type d \( -execdir test -e {}/.ignore \; -prune \) -o \( -execdir test -d {}/.git \; -prune -print \) )"

echo -e "${HIGHLIGHT}Scanning ${PWD}${NORMAL}"
for d in $DIRS; do
  cd "$d" > /dev/null
  echo -e "\n${HIGHLIGHT}Updating `pwd`$NORMAL"
  git pull
  cd - > /dev/null
done

bash 如何快速找到一个目录下的所有git repos

提问by Mike Slinn

采纳答案by Mike Slinn

回答by Clayton Stanley

回答by vaab

回答by CharlieB

回答by user5696355

回答by Clayton Stanley

回答by Greg Barrett

回答by Mike Slinn

相关推荐

最近更新

标签

bash 如何快速找到一个目录下的所有git repos

提问by Mike Slinn

采纳答案by Mike Slinn

回答by Clayton Stanley

回答by vaab

回答by CharlieB

回答by user5696355

回答by Clayton Stanley

回答by Greg Barrett

回答by Mike Slinn

相关推荐

bash sed 将 ' 替换为 \'

bash 带有 getopts 的可选选项参数

bash 从字符串中提取模式

如何在 Bash 中进行浮动比较？

相关推荐

最近更新

标签