Git 的 Dockerfile 策略
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33682123/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Dockerfile strategies for Git
提问by Hemerson Varela
What is the best strategy to clone a private Git repository into a Docker container using a Dockerfile? Pros/Cons?
使用 Dockerfile 将私有 Git 存储库克隆到 Docker 容器的最佳策略是什么?优点缺点?
I know that I can add commands on Dockerfile in order to clone my private repository into a docker container. But I would like to know which different approaches people have used on this case.
我知道我可以在 Dockerfile 上添加命令,以便将我的私有存储库克隆到 docker 容器中。但我想知道人们在这个案例中使用了哪些不同的方法。
It's not covered in the Dockerfile best practices guide.
它没有包含在 Dockerfile 最佳实践指南中。
回答by Hemerson Varela
From Ryan Baumann's blog post “Git strategies for Docker”
来自 Ryan Baumann 的博客文章“Docker 的 Git 策略”
There are different strategies for getting your Git source code into a Docker build. Many of these have different ways of interacting with Docker's caching mechanisms, and may be more or less appropriately suited to your project and how you intend to use Docker.
RUN git clone
If you're like me, this is the approach that first springs to mind when you see the commands available to you in a Dockerfile. The trouble with this is that it can interact in several unintuitive ways with Docker's build caching mechanisms. For example, if you make an update to your git repository, and then re-run the docker build which has a RUN git clone command, you may or may not get the new commit(s) depending on if the preceding Dockerfile commands have invalidated the cache.
One way to get around this is to use docker build
--no-cache
, but then if there are any time-intensive commands preceding the clone they'll have to run again too.Another issue is that you (or someone you've distributed your Dockerfile to) may unexpectedly come back to a broken build later on when the upstream git repository updates.
A two-birds-one-stone approach to this while still using RUN git clone is to put it on one line1 with a specific revision checkout, e.g.:
RUN git clone https://github.com/example/example.git && cd example && git checkout 0123abcdef
Then updating the revision to check out in the Dockerfile will invalidate the cache at that line and cause the clone/checkout to run.
One possible drawback to this approach in general is that you have to have git installed in your container.
RUN curl or ADD a tag/commit tarball URL
This avoids having to have git installed in your container environment, and can benefit from being explicit about when the cache will break (i.e. if the tag/revision is part of the URL, that URL change will bust the cache). Note that if you use the Dockerfile ADD command to copy from a remote URL, the file will be downloaded every time you run the build, and the HTTP Last-Modified header will also be used to invalidate the cache.
You can see this approach used in the golang Dockerfile.
Git submodules inside Dockerfile repository
If you keep your Dockerfile and Docker build in a separate repository from your source code, or your Docker build requires multiple source repositories, using git submodules (or git subtrees) in this repository may be a valid way to get your source repos into your build context. This avoids some concerns with Docker caching and upstream updating, as you lock the upstream revision in your submodule/subtree specification. Updating them will break your Docker cache as it changes the build context.
Note that this only gets the files into your Docker build context, you still need to use ADD commands in your Dockerfile to copy those paths to where you expect them in the container.
You can see this approach used in the here
Dockerfile inside git repository
Here, you just have your Dockerfile in the same git repository alongside the code you want to build/test/deploy, so it automatically gets sent as part of the build context, so you can e.g. ADD . /project to copy the context into the container. The advantage to this is that you can test changes without having to potentially commit/push them to get them into a test docker build; the disadvantage is that every time you modify any files in your working directory it will invalidate the cache at the ADD command. Sending the build context for a large source/data directory can also be time-consuming. So if you use this approach, you may also want to make judicious use of the .dockerignore file, including doing things like ignoring everything in your .gitignore and possibly the .git directory itself.
Volume mapping
If you're using Docker to set up a dev/test environment that you want to share among a wide variety of source repos on your host machine, mounting a host directory as a data volumemay be a viable strategy. This gives you the ability to specify which directories you want to include at docker run-time, and avoids concerns about docker build caching, but none of this will be shared among other users of your Dockerfile or container image.
将 Git 源代码放入 Docker 构建中有不同的策略。其中许多具有与 Docker 缓存机制交互的不同方式,并且可能或多或少适合您的项目以及您打算如何使用 Docker。
运行 git 克隆
如果您像我一样,当您在 Dockerfile 中看到可用的命令时,首先会想到这种方法。这样做的问题是它可以以几种不直观的方式与 Docker 的构建缓存机制进行交互。例如,如果您更新 git 存储库,然后重新运行具有 RUN git clone 命令的 docker build,您可能会或可能不会获得新的提交,具体取决于前面的 Dockerfile 命令是否已失效缓存。
解决这个问题的一种方法是使用 docker build
--no-cache
,但是如果在克隆之前有任何耗时的命令,它们也必须再次运行。另一个问题是,当上游 git 存储库更新时,您(或您已将 Dockerfile 分发给的人)可能会意外地返回到损坏的构建。
在仍然使用 RUN git clone 的情况下,一种两鸟一石的方法是将它放在具有特定修订检出的 line1 上,例如:
RUN git clone https://github.com/example/example.git && cd example && git checkout 0123abcdef
然后更新修订以在 Dockerfile 中签出将使该行的缓存无效并导致克隆/签出运行。
通常,这种方法的一个可能缺点是您必须在容器中安装 git。
运行 curl 或添加标签/提交 tarball URL
这避免了必须在您的容器环境中安装 git,并且可以受益于明确说明缓存何时会中断(即,如果标记/修订是 URL 的一部分,则该 URL 更改将破坏缓存)。请注意,如果您使用 Dockerfile ADD 命令从远程 URL 复制,则每次运行构建时都会下载该文件,并且 HTTP Last-Modified 标头也将用于使缓存无效。
您可以在golang Dockerfile 中看到这种方法。
Dockerfile 存储库中的 Git 子模块
如果您将 Dockerfile 和 Docker 构建与源代码保存在单独的存储库中,或者您的 Docker 构建需要多个源存储库,则在此存储库中使用 git 子模块(或 git 子树)可能是将源存储库放入构建的有效方法语境。这避免了对 Docker 缓存和上游更新的一些担忧,因为您在子模块/子树规范中锁定了上游修订版。更新它们将破坏您的 Docker 缓存,因为它会更改构建上下文。
请注意,这只会将文件放入您的 Docker 构建上下文中,您仍然需要在 Dockerfile 中使用 ADD 命令将这些路径复制到您期望它们在容器中的位置。
你可以看到这种方法在使用 这里
git 存储库中的 Dockerfile
在这里,您只需将 Dockerfile 与要构建/测试/部署的代码放在同一个 git 存储库中,因此它会自动作为构建上下文的一部分发送,因此您可以例如 ADD 。/project 将上下文复制到容器中。这样做的好处是您可以测试更改,而无需潜在地提交/推送它们以将它们放入测试 docker 构建中;缺点是每次修改工作目录中的任何文件时,都会使 ADD 命令中的缓存无效。为大型源/数据目录发送构建上下文也可能很耗时。因此,如果您使用这种方法,您可能还想明智地使用 .dockerignore 文件,包括忽略 .gitignore 中的所有内容以及可能的 .git 目录本身。
体积映射
如果您正在使用 Docker 设置开发/测试环境,并希望在主机上的各种源存储库之间共享,那么 将主机目录挂载为数据卷可能是一种可行的策略。这使您能够指定要在 docker 运行时包含哪些目录,并避免担心 docker 构建缓存,但这些都不会在 Dockerfile 或容器映像的其他用户之间共享。
回答by VonC
You have generally two approaches:
您通常有两种方法:
- referencing a vault where you get your secret data necessary to access what you need to put in your image (here, your ssh keys to access your private repo)
- 引用一个保管库,您可以在其中获取访问需要放入图像的内容所必需的秘密数据(这里是访问您的私人存储库的 ssh 密钥)
Update 2018: see "How to keep your container secrets secure", which includes:
2018 年更新:请参阅“如何确保容器机密安全”,其中包括:
- Use volume mounts to pass secrets to a container at runtime
- Have a plan for rotating secrets
- Make sure your secrets are encrypted
- 使用卷挂载在运行时将机密传递给容器
- 制定轮换秘密的计划
- 确保您的秘密已加密
- or a squashing technique (not recommended, see comment)
- 或挤压技术(不推荐,见评论)
For the second approach, see "Pulling Git into a Docker image without leaving SSH keys behind"
对于第二种方法,请参阅“将Git 拉入 Docker 镜像而不留下 SSH 密钥”
- Add the private key to the Dockerfile
- Add it to the ssh-agent
- Run the commands that require SSH authentication
- Remove the private key
Dockerfile:
- 将私钥添加到 Dockerfile
- 将其添加到 ssh-agent
- 运行需要 SSH 身份验证的命令
- 删除私钥
Dockerfile:
ADD ~/.ssh/mykey /tmp/
RUN ssh-agent /tmp
# RUN bundle install or similar command
RUN rm /tmp/mykey
Let's build the image now:
现在让我们构建图像:
$ docker build -t original .
Squash the layers:
docker save original | sudo docker-squash -t squashed | docker load
压扁层:
docker save original | sudo docker-squash -t squashed | docker load
回答by BMitch
There are several strategies I can think of:
我能想到的有几种策略:
Option A: Single stage inside the Dockerfile:
选项 A:Dockerfile 中的单个阶段:
ADD ssh-private-key /root/.ssh/id_rsa
RUN git clone git@host:repo/path.git
This has the several significant downsides:
这有几个明显的缺点:
- Your private key is inside the docker image.
- The step will be cached from a previous build on later builds, even when your repo changes, unless you break the cache on an earlier step. That's because the
RUN
line is unchanged.
- 您的私钥在 docker 镜像中。
- 该步骤将从之前的构建中缓存到以后的构建中,即使您的 repo 更改,除非您在较早的步骤中破坏了缓存。那是因为这
RUN
条线没有改变。
Option B: Multi-stage inside the Dockerfile:
选项 B:Dockerfile 中的多阶段:
FROM base-image as clone
ADD ssh-private-key /root/.ssh/id_rsa
RUN git clone git@host:repo/path.git
RUN rm -rf /path/.git
FROM base-image as build
COPY --from=clone /path /path
...
By using the multi-stage, your ssh credentials are now only on the build host as long as you never push your "clone" stage layers anywhere. This is slightly better, but still has caching issues (see the tip at the end). By adding the rm
step, the later COPY --from
will no longer copy those files. Since the build image or later should be all you ship, being inefficient on the layers in the clone stage is less of a concern.
通过使用多阶段,只要您从不将“克隆”阶段层推送到任何地方,您的 ssh 凭证现在只在构建主机上。这稍微好一点,但仍然存在缓存问题(请参阅最后的提示)。通过添加rm
步骤,后者COPY --from
将不再复制这些文件。由于构建映像或更高版本应该是您交付的全部内容,因此在克隆阶段的层上效率低下就不那么重要了。
Option C: From your CI server:
选项 C:从您的 CI 服务器:
Typically, the Dockerfile is in the code repo, and people tend to clone this first, before running the build (though it is possible to skip this by using a git repo as a build context). Therefore you'll often see CI servers perform the clone and update rather than the Dockerfile itself. The resulting Dockerfile is then just:
通常,Dockerfile 位于代码存储库中,人们倾向于在运行构建之前先克隆它(尽管可以通过使用 git 存储库作为构建上下文来跳过它)。因此,您经常会看到 CI 服务器执行克隆和更新,而不是 Dockerfile 本身。生成的 Dockerfile 就是:
COPY path /path
This has several advantages:
这有几个优点:
- The credentials never get added to the docker image layers.
- Updating the repo doesn't rerunning the clone from scratch, the previous clone is already there and you can run a
git pull
instead, which is much faster. - Copying files into the image can include
.git
inside of the.dockerignore
to exclude all of the git internals. Therefore you only add the final state of the repo to your docker image, resulting in a much smaller image.
- 凭据永远不会添加到 docker 镜像层。
- 更新 repo 不会从头开始重新运行克隆,以前的克隆已经存在,您可以运行 a
git pull
代替,这要快得多。 - 将文件复制到映像中可以包含
.git
在 内部.dockerignore
以排除所有 git 内部结构。因此,您只需将 repo 的最终状态添加到您的 docker 镜像中,从而生成一个小得多的镜像。
Admittedly, this option is saying "don't do that" to your question, but it's also the most popular option I've seen from people facing this challenge, for good reason.
诚然,这个选项对你的问题说“不要那样做”,但这也是我从面临这个挑战的人那里看到的最受欢迎的选项,这是有充分理由的。
Option D: With BuildKit:
选项 D:使用 BuildKit:
BuildKit has several experimental features that may be useful. These require newer versions of Docker that may not be on every build host, and the syntax to inject the options is not backwards compatible. The main two options are secrets or ssh credential injection, and cache directories. Both of these can inject a file or directory into the build step that is not saved into the resulting image layers. Here's what that could look like (this is untested):
BuildKit 有几个可能有用的实验性功能。这些需要较新版本的 Docker,可能不是每个构建主机上都有,并且注入选项的语法不向后兼容。主要的两个选项是秘密或 ssh 凭据注入和缓存目录。这两者都可以将文件或目录注入到构建步骤中,而不会保存到生成的图像层中。这是可能的样子(这是未经测试的):
# syntax=docker/dockerfile:experimental
FROM base-image
ARG CACHE_BUST
RUN --mount=type=cache,target=/git-cache,id=git-cache,sharing=locked \
--mount=type=secret,id=ssh,target=/root/.ssh/id_rsa \
if [ ! -d /git-cache/path/.git ]; then \
git clone git@host:repo/path.git /git-cache/path; \
else \
(cd /git-cache/path && git pull --force); \
fi; \
tar -cC /git-cache/path --exclude .git . | tar -xC /path
And then the build would look like:
然后构建看起来像:
DOCKER_BUILDKIT=1 docker build \
--secret id=ssh,src=$HOME/.ssh/id_rsa \
--build-arg "CACHE_BUST=$(date +%s)" \
-t img:tag \
.
This is fairly convoluted, but has a few advantages:
这是相当复杂的,但有一些优点:
- The cache directory keeps the git repo from the last build, saving a large clone for every build, only pulling the changes.
- The tar command was basically a copy that excluded the
.git
directory from the final image, making your image smaller. This copy is needed since the cache directory is not saved into the resulting image layers. - The ssh credentials were injected as a secret that appears similar to a single file read-only volume mount for that specific
RUN
step, and the contents of that secret were not saved to the resulting image layer.
- 缓存目录保存了上次构建的 git repo,为每个构建保存一个大的克隆,只提取更改。
- tar 命令基本上是一个
.git
从最终图像中排除目录的副本,使您的图像更小。由于缓存目录未保存到生成的图像层中,因此需要此副本。 - ssh 凭证作为一个秘密注入,看起来类似于该特定
RUN
步骤的单个文件只读卷安装,并且该秘密的内容没有保存到生成的图像层。
To read more about BuildKit's experimental features, see: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md
要阅读有关 BuildKit 的实验性功能的更多信息,请参阅:https: //github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md
Tip: Cache busting a specific line:
提示:缓存破坏特定行:
To bust the docker build cache on a specific line, you can inject a build arg that changes on every build right before the RUN line that you want to rerun. In the BuildKit example, there was the:
要在特定行上破坏 docker 构建缓存,您可以在要重新运行的 RUN 行之前注入一个构建参数,该参数在每个构建中都会发生变化。在 BuildKit 示例中,有:
ARG CACHE_BUST
before the RUN
line that I did not want to cache, and the build included:
在RUN
我不想缓存的行之前,构建包括:
--build-arg "CACHE_BUST=$(date +%s)"
to inject a unique variable for each build. This ensures the build always runs that step, even though the command is otherwise unchanged. The build arg is injected as an environment variable to the RUN
so docker then sees this command has changed and cannot be reused from the cache.
为每个构建注入一个唯一的变量。这确保构建始终运行该步骤,即使命令未更改。build arg 作为环境变量注入RUN
so docker 然后看到此命令已更改并且无法从缓存中重用。
Ideally, you would clone a specific tag or commit id, which allows you to cache builds that use that same git clone from previous builds. However, if you are cloning master, this cache busting technique will be needed.
理想情况下,您将克隆特定的标记或提交 ID,这允许您缓存使用与先前构建相同的 git clone 的构建。但是,如果您是克隆大师,则需要这种缓存破坏技术。