为什么在 Alpine Linux 上安装 Pandas 需要很长时间

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/49037742/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 15:49:35  来源:igfitidea点击:

Why does it take ages to install Pandas on Alpine Linux

pandasnumpydockeralpine

提问by moku

I've noticed that installing Pandas and Numpy (it's dependency) in a Docker container using the base OS Alpine vs. CentOS or Debian takes much longer. I created a little test below to demonstrate the time difference. Aside from the few seconds Alpine takes to update and download the build dependencies to install Pandas and Numpy, why does the setup.py take around 70x more time than on Debian install?

我注意到使用基本操作系统 Alpine 与 CentOS 或 Debian 在 Docker 容器中安装 Pandas 和 Numpy(它的依赖项)需要更长的时间。我在下面创建了一个小测试来演示时差。除了 Alpine 需要几秒钟来更新和下载构建依赖项以安装 Pandas 和 Numpy,为什么 setup.py 比 Debian 安装花费的时间多 70 倍?

Is there any way to speed up the install using Alpine as the base image or is there another base image of comparable size to Alpine that is better to use for packages like Pandas and Numpy?

有没有办法使用 Alpine 作为基本映像来加快安装速度,或者是否有另一个与 Alpine 大小相当的基本映像更适合用于 Pandas 和 Numpy 等软件包?

Dockerfile.debian

Dockerfile.debian

FROM python:3.6.4-slim-jessie

RUN pip install pandas

Build Debian image with Pandas & Numpy:

使用 Pandas 和 Numpy 构建 Debian 映像:

[PandasDockerTest] time docker build -t debian-pandas -f Dockerfile.debian . --no-cache
    Sending build context to Docker daemon  3.072kB
    Step 1/2 : FROM python:3.6.4-slim-jessie
     ---> 43431c5410f3
    Step 2/2 : RUN pip install pandas
     ---> Running in 2e4c030f8051
    Collecting pandas
      Downloading pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl (26.2MB)
    Collecting numpy>=1.9.0 (from pandas)
      Downloading numpy-1.14.1-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
    Collecting pytz>=2011k (from pandas)
      Downloading pytz-2018.3-py2.py3-none-any.whl (509kB)
    Collecting python-dateutil>=2 (from pandas)
      Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
    Collecting six>=1.5 (from python-dateutil>=2->pandas)
      Downloading six-1.11.0-py2.py3-none-any.whl
    Installing collected packages: numpy, pytz, six, python-dateutil, pandas
    Successfully installed numpy-1.14.1 pandas-0.22.0 python-dateutil-2.6.1 pytz-2018.3 six-1.11.0
    Removing intermediate container 2e4c030f8051
     ---> a71e1c314897
    Successfully built a71e1c314897
    Successfully tagged debian-pandas:latest
    docker build -t debian-pandas -f Dockerfile.debian . --no-cache  0.07s user 0.06s system 0% cpu 13.605 total

Dockerfile.alpine

Dockerfile.alpine

FROM python:3.6.4-alpine3.7

RUN apk --update add --no-cache g++

RUN pip install pandas

Build Alpine image with Pandas & Numpy:

使用 Pandas 和 Numpy 构建 Alpine 镜像:

[PandasDockerTest] time docker build -t alpine-pandas -f Dockerfile.alpine . --no-cache
Sending build context to Docker daemon   16.9kB
Step 1/3 : FROM python:3.6.4-alpine3.7
 ---> 4b00a94b6f26
Step 2/3 : RUN apk --update add --no-cache g++
 ---> Running in 4b0c32551e3f
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
(1/17) Upgrading musl (1.1.18-r2 -> 1.1.18-r3)
(2/17) Installing libgcc (6.4.0-r5)
(3/17) Installing libstdc++ (6.4.0-r5)
(4/17) Installing binutils-libs (2.28-r3)
(5/17) Installing binutils (2.28-r3)
(6/17) Installing gmp (6.1.2-r1)
(7/17) Installing isl (0.18-r0)
(8/17) Installing libgomp (6.4.0-r5)
(9/17) Installing libatomic (6.4.0-r5)
(10/17) Installing pkgconf (1.3.10-r0)
(11/17) Installing mpfr3 (3.1.5-r1)
(12/17) Installing mpc1 (1.0.3-r1)
(13/17) Installing gcc (6.4.0-r5)
(14/17) Installing musl-dev (1.1.18-r3)
(15/17) Installing libc-dev (0.7.1-r0)
(16/17) Installing g++ (6.4.0-r5)
(17/17) Upgrading musl-utils (1.1.18-r2 -> 1.1.18-r3)
Executing busybox-1.27.2-r7.trigger
OK: 184 MiB in 50 packages
Removing intermediate container 4b0c32551e3f
 ---> be26c3bf4e42
Step 3/3 : RUN pip install pandas
 ---> Running in 36f6024e5e2d
Collecting pandas
  Downloading pandas-0.22.0.tar.gz (11.3MB)
Collecting python-dateutil>=2 (from pandas)
  Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
Collecting pytz>=2011k (from pandas)
  Downloading pytz-2018.3-py2.py3-none-any.whl (509kB)
Collecting numpy>=1.9.0 (from pandas)
  Downloading numpy-1.14.1.zip (4.9MB)
Collecting six>=1.5 (from python-dateutil>=2->pandas)
  Downloading six-1.11.0-py2.py3-none-any.whl
Building wheels for collected packages: pandas, numpy
  Running setup.py bdist_wheel for pandas: started
  Running setup.py bdist_wheel for pandas: still running...
  Running setup.py bdist_wheel for pandas: still running...
  Running setup.py bdist_wheel for pandas: still running...
  Running setup.py bdist_wheel for pandas: still running...
  Running setup.py bdist_wheel for pandas: still running...
  Running setup.py bdist_wheel for pandas: still running...
  Running setup.py bdist_wheel for pandas: finished with status 'done'
  Stored in directory: /root/.cache/pip/wheels/e8/ed/46/0596b51014f3cc49259e52dff9824e1c6fe352048a2656fc92
  Running setup.py bdist_wheel for numpy: started
  Running setup.py bdist_wheel for numpy: still running...
  Running setup.py bdist_wheel for numpy: still running...
  Running setup.py bdist_wheel for numpy: still running...
  Running setup.py bdist_wheel for numpy: finished with status 'done'
  Stored in directory: /root/.cache/pip/wheels/9d/cd/e1/4d418b16ea662e512349ef193ed9d9ff473af715110798c984
Successfully built pandas numpy
Installing collected packages: six, python-dateutil, pytz, numpy, pandas
Successfully installed numpy-1.14.1 pandas-0.22.0 python-dateutil-2.6.1 pytz-2018.3 six-1.11.0
Removing intermediate container 36f6024e5e2d
 ---> a93c59e6a106
Successfully built a93c59e6a106
Successfully tagged alpine-pandas:latest
docker build -t alpine-pandas -f Dockerfile.alpine . --no-cache  0.54s user 0.33s system 0% cpu 16:08.47 total

采纳答案by nickgryg

Debian based images use only python pipto install packages with .whlformat:

基于 Debian 的映像仅python pip用于安装以下.whl格式的软件包:

  Downloading pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl (26.2MB)
  Downloading numpy-1.14.1-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)

WHL format was developed as a quicker and more reliable method of installing Python software than re-building from source code every time. WHL files only have to be moved to the correct location on the target system to be installed, whereas a source distribution requires a build step before installation.

WHL 格式的开发是为了安装 Python 软件,而不是每次都从源代码重新构建。WHL 文件只需移动到要安装的目标系统上的正确位置,而源代码分发在安装之前需要一个构建步骤。

Wheel packages pandasand numpyare not supported in images based on Alpine platform. That's why when we install them using python pipduring the building process, we always compile them from the source files in alpine:

轮包pandasnumpy基于高山平台,图像不支持。这就是为什么当我们python pip在构建过程中安装它们时,我们总是从 alpine 的源文件中编译它们:

  Downloading pandas-0.22.0.tar.gz (11.3MB)
  Downloading numpy-1.14.1.zip (4.9MB)

and we can see the following inside container during the image building:

我们可以在镜像构建过程中看到以下内部容器:

/ # ps aux
PID   USER     TIME   COMMAND
    1 root       0:00 /bin/sh -c pip install pandas
    7 root       0:04 {pip} /usr/local/bin/python /usr/local/bin/pip install pandas
   21 root       0:07 /usr/local/bin/python -c import setuptools, tokenize;__file__='/tmp/pip-build-en29h0ak/pandas/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n
  496 root       0:00 sh
  660 root       0:00 /bin/sh -c gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DTHREAD_STACK_SIZE=0x100000 -fPIC -Ibuild/src.linux-x86_64-3.6/numpy/core/src/pri
  661 root       0:00 gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DTHREAD_STACK_SIZE=0x100000 -fPIC -Ibuild/src.linux-x86_64-3.6/numpy/core/src/private -Inump
  662 root       0:00 /usr/libexec/gcc/x86_64-alpine-linux-musl/6.4.0/cc1 -quiet -I build/src.linux-x86_64-3.6/numpy/core/src/private -I numpy/core/include -I build/src.linux-x86_64-3.6/numpy/core/includ
  663 root       0:00 ps aux

If we modify Dockerfilea little:

如果我们Dockerfile稍微修改一下:

FROM python:3.6.4-alpine3.7
RUN apk add --no-cache g++ wget
RUN wget https://pypi.python.org/packages/da/c6/0936bc5814b429fddb5d6252566fe73a3e40372e6ceaf87de3dec1326f28/pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
RUN pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl

we get the following error:

我们收到以下错误:

Step 4/4 : RUN pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
 ---> Running in 0faea63e2bda
pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl is not a supported wheel on this platform.
The command '/bin/sh -c pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl' returned a non-zero code: 1

Unfortunately, the only way to install pandason an Alpine image is to wait until build finishes.

不幸的是,在pandasAlpine 映像上安装的唯一方法是等到构建完成。

Of course if you want to use the Alpine image with pandasin CI for example, the best way to do so is to compile it once, push it to any registry and use it as a base image for your needs.

当然,如果您想pandas在 CI 中使用 Alpine 映像,最好的方法是编译一次,将其推送到任何注册表并将其用作满足您需求的基本映像。

EDIT:If you want to use the Alpine image with pandasyou can pull my nickgryg/alpine-pandasdocker image. It is a python image with pre-compiled pandason the Alpine platform. It should save your time.

编辑:如果你想使用 Alpine 图像,pandas你可以拉我的nickgryg/alpine-pandas docker图像。它是一个pandas在 Alpine 平台上预编译的 python 镜像。它应该可以节省您的时间。

回答by jtlz2

ANSWER: AS OF 25/10/2019, FOR PYTHON 3, IT STILL DOESN'T!

答案:截至 25/10/2019,对于 PYTHON 3,它仍然没有!

Here is a complete working Dockerfile:

这是一个完整的工作 Dockerfile:

FROM python:3.7-alpine
RUN echo "@testing http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
RUN apk add --update --no-cache py3-numpy py3-pandas@testing

The build is very sensitive to the exact python and alpine version numbers - getting these wrong seems to provoke Max Levy's error so:libpython3.7m.so.1.0 (missing)- but the above does now work for me.

构建对确切的 python 和 alpine 版本号非常敏感 - 弄错这些似乎会引发 Max Levy 的错误so:libpython3.7m.so.1.0 (missing)- 但上面的内容现在对我有用。

My updated Dockerfile is available at https://gist.github.com/jtlz2/b0f4bc07ce2ff04bc193337f2327c13b

我更新的 Dockerfile 可在https://gist.github.com/jtlz2/b0f4bc07ce2ff04bc193337f2327c13b 获得



[Earlier Update:]

[较早的更新:]

ANSWER: IT DOESN'T!

答案:没有!

In any Alpine Dockerfile you can simply do*

在任何 Alpine Dockerfile 中,您都可以简单地做*

RUN apk add py2-numpy@community py2-scipy@community py-pandas@edge

This is because numpy, scipyand now pandasare all available prebuilt on alpine:

这是因为numpyscipy现在pandas都可以预构建在alpine

https://pkgs.alpinelinux.org/packages?name=*numpy

https://pkgs.alpinelinux.org/packages?name=*numpy

https://pkgs.alpinelinux.org/packages?name=*scipy&branch=edge

https://pkgs.alpinelinux.org/packages?name=*scipy&branch=edge

https://pkgs.alpinelinux.org/packages?name=*pandas&branch=edge

https://pkgs.alpinelinux.org/packages?name=*pandas&branch=edge

One way to avoid rebuilding every time, or using a Docker layer, is to use a prebuilt, native Alpine Linux/.apkpackage, e.g.

避免每次重建或使用 Docker 层的一种方法是使用预先构建的原生 Alpine Linux/.apk包,例如

https://github.com/sgerrand/alpine-pkg-py-pandas

https://github.com/sgerrand/alpine-pkg-py-pandas

https://github.com/nbgallery/apks

https://github.com/nbgallery/apks

You can build these .apks once and use them wherever in your Dockerfile you like :)

您可以构建这些.apks 并在您喜欢的 Dockerfile 中的任何位置使用它们:)

This also saves you having to bake everything else into the Docker image before the fact - i.e. the flexibility to pre-build any Docker image you like.

这也使您不必在事前将其他所有内容烘焙到 Docker 映像中 - 即预先构建您喜欢的任何 Docker 映像的灵活性。

PS I have put a Dockerfile stub at https://gist.github.com/jtlz2/b0f4bc07ce2ff04bc193337f2327c13bthat shows roughly how to build the image. These include the important steps (*):

PS 我在https://gist.github.com/jtlz2/b0f4bc07ce2ff04bc193337f2327c13b放了一个 Dockerfile 存根,它大致显示了如何构建映像。其中包括重要步骤 (*):

RUN echo "@community http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories
RUN apk update
RUN apk add --update --no-cache libgfortran

回答by stefanitsky

ATTENTION
Look at the @jtlz2 answer with the latest update

注意
查看最新更新的@jtlz2 答案

OUTDATED

过时

So, py3-pandas & py3-numpy packages moved to the testing alpine repository, so, you can download it by adding these lines in to the your Dockerfile:

因此,py3-pandas 和 py3-numpy 包已移至测试 alpine 存储库,因此,您可以通过将这些行添加到 Dockerfile 中来下载它:

RUN echo "http://dl-8.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories \
  && apk update \
  && apk add py3-numpy py3-pandas

Hope it helps someone!

Alpine packages links:
- py3-pandas
- py3-numpy

Alpine repositories docks info.

希望它可以帮助某人!

Alpine 软件包链接:
- py3-pandas
- py3-numpy

Alpine 存储库停靠信息

回答by ThisGuyCantEven

Just going to bring some of these answers together in one answer and add a detail I think was missed. The reason certain python libraries, particularly optimized math and data libraries, take so long to build on alpine is because the pip wheels for these libraries include binaries precompiled from c/c++ and linked against glibc, a common set of c standard libraries. Debian, Fedora, CentOS all (typically) use glibc, but alpine, in order to stay lightweight, uses musl-libcinstead. c/c++ binaries build on a glibcsystem will not work on a system without glibcand the same goes for musl.

只是将这些答案中的一些汇总到一个答案中,并添加我认为遗漏的细节。某些 python 库,特别是优化的数学和数据库,在 alpine 上构建需要很长时间的原因是这些库的 pip 轮包括从 c/c++ 预编译并链接到glibc一组通用 c 标准库的二进制文件。Debian、Fedora、CentOS 都(通常)使用glibc,但 alpine 为了保持轻量级,musl-libc而是使用。C / C ++二进制文件上构建glibc系统没有一个系统上运行glibc和同样的musl

Pip looks first for a wheel with the correct binaries, if it can't find one, it tries to compile the binaries from the c/c++ source and links them against musl. In many cases, this won't even work unless you have the python headers from python3-devor build tools like make.

Pip 首先查找具有正确二进制文件的轮子,如果找不到,它会尝试从 c/c++ 源代码编译二进制文件并将它们链接到 musl。在许多情况下,除非您拥有 python 头文件python3-dev或构建工具(如make.

Now the silver lining, as others have mentioned, there are apkpackages with the proper binaries provided by the community, using these will save you the (sometimes lengthy) process of building the binaries.

现在的一线希望,正如其他人所提到的,apk社区提供了带有适当二进制文件的软件包,使用这些可以为您节省构建二进制文件的(有时是冗长的)过程。

回答by Flávio Henrique

Real honest advice here, switch to Debian based image and then all your problems will be gone.

真正诚实的建议,切换到基于 Debian 的镜像,然后你所有的问题都会消失。

Alpine for python applications doesn't work well.

Alpine for python 应用程序运行不佳。

Here is an example of my dockerfile:

这是我的一个例子dockerfile

FROM python:3.7.6-buster

RUN pip install pandas==1.0.0
RUN pip install sklearn
RUN pip install Django==3.0.2
RUN pip install cx_Oracle==7.3.0
RUN pip install excel
RUN pip install djangorestframework==3.11.0

The python:3.7.6-busteris more appropriate in this case, in addition, you don't need any extra dependency in the OS.

python:3.7.6-buster是在这种情况下更合适,此外,您不需要在OS任何额外的依赖。

Follow a usefull and recent article: https://pythonspeed.com/articles/alpine-docker-python/:

关注一篇有用的最新文章:https: //pythonspeed.com/articles/alpine-docker-python/ :

Don't use Alpine Linux for Python images Unless you want massively slower build times, larger images, more work, and the potential for obscure bugs, you'll want to avoid Alpine Linux as a base image. For some recommendations on what you should use, see my article on choosing a good base image.

不要将 Alpine Linux 用于 Python 镜像除非您想要大幅减慢构建时间、更大的镜像、更多的工作以及潜在的隐蔽错误,否则您将希望避免将 Alpine Linux 作为基础镜像。有关您应该使用什么的一些建议,请参阅我关于选择良好基础映像的文章。