为什么在 Alpine Linux 上安装 Pandas 需要很长时间
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/49037742/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why does it take ages to install Pandas on Alpine Linux
提问by moku
I've noticed that installing Pandas and Numpy (it's dependency) in a Docker container using the base OS Alpine vs. CentOS or Debian takes much longer. I created a little test below to demonstrate the time difference. Aside from the few seconds Alpine takes to update and download the build dependencies to install Pandas and Numpy, why does the setup.py take around 70x more time than on Debian install?
我注意到使用基本操作系统 Alpine 与 CentOS 或 Debian 在 Docker 容器中安装 Pandas 和 Numpy(它的依赖项)需要更长的时间。我在下面创建了一个小测试来演示时差。除了 Alpine 需要几秒钟来更新和下载构建依赖项以安装 Pandas 和 Numpy,为什么 setup.py 比 Debian 安装花费的时间多 70 倍?
Is there any way to speed up the install using Alpine as the base image or is there another base image of comparable size to Alpine that is better to use for packages like Pandas and Numpy?
有没有办法使用 Alpine 作为基本映像来加快安装速度,或者是否有另一个与 Alpine 大小相当的基本映像更适合用于 Pandas 和 Numpy 等软件包?
Dockerfile.debian
Dockerfile.debian
FROM python:3.6.4-slim-jessie
RUN pip install pandas
Build Debian image with Pandas & Numpy:
使用 Pandas 和 Numpy 构建 Debian 映像:
[PandasDockerTest] time docker build -t debian-pandas -f Dockerfile.debian . --no-cache
Sending build context to Docker daemon 3.072kB
Step 1/2 : FROM python:3.6.4-slim-jessie
---> 43431c5410f3
Step 2/2 : RUN pip install pandas
---> Running in 2e4c030f8051
Collecting pandas
Downloading pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl (26.2MB)
Collecting numpy>=1.9.0 (from pandas)
Downloading numpy-1.14.1-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
Collecting pytz>=2011k (from pandas)
Downloading pytz-2018.3-py2.py3-none-any.whl (509kB)
Collecting python-dateutil>=2 (from pandas)
Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
Collecting six>=1.5 (from python-dateutil>=2->pandas)
Downloading six-1.11.0-py2.py3-none-any.whl
Installing collected packages: numpy, pytz, six, python-dateutil, pandas
Successfully installed numpy-1.14.1 pandas-0.22.0 python-dateutil-2.6.1 pytz-2018.3 six-1.11.0
Removing intermediate container 2e4c030f8051
---> a71e1c314897
Successfully built a71e1c314897
Successfully tagged debian-pandas:latest
docker build -t debian-pandas -f Dockerfile.debian . --no-cache 0.07s user 0.06s system 0% cpu 13.605 total
Dockerfile.alpine
Dockerfile.alpine
FROM python:3.6.4-alpine3.7
RUN apk --update add --no-cache g++
RUN pip install pandas
Build Alpine image with Pandas & Numpy:
使用 Pandas 和 Numpy 构建 Alpine 镜像:
[PandasDockerTest] time docker build -t alpine-pandas -f Dockerfile.alpine . --no-cache
Sending build context to Docker daemon 16.9kB
Step 1/3 : FROM python:3.6.4-alpine3.7
---> 4b00a94b6f26
Step 2/3 : RUN apk --update add --no-cache g++
---> Running in 4b0c32551e3f
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.7/community/x86_64/APKINDEX.tar.gz
(1/17) Upgrading musl (1.1.18-r2 -> 1.1.18-r3)
(2/17) Installing libgcc (6.4.0-r5)
(3/17) Installing libstdc++ (6.4.0-r5)
(4/17) Installing binutils-libs (2.28-r3)
(5/17) Installing binutils (2.28-r3)
(6/17) Installing gmp (6.1.2-r1)
(7/17) Installing isl (0.18-r0)
(8/17) Installing libgomp (6.4.0-r5)
(9/17) Installing libatomic (6.4.0-r5)
(10/17) Installing pkgconf (1.3.10-r0)
(11/17) Installing mpfr3 (3.1.5-r1)
(12/17) Installing mpc1 (1.0.3-r1)
(13/17) Installing gcc (6.4.0-r5)
(14/17) Installing musl-dev (1.1.18-r3)
(15/17) Installing libc-dev (0.7.1-r0)
(16/17) Installing g++ (6.4.0-r5)
(17/17) Upgrading musl-utils (1.1.18-r2 -> 1.1.18-r3)
Executing busybox-1.27.2-r7.trigger
OK: 184 MiB in 50 packages
Removing intermediate container 4b0c32551e3f
---> be26c3bf4e42
Step 3/3 : RUN pip install pandas
---> Running in 36f6024e5e2d
Collecting pandas
Downloading pandas-0.22.0.tar.gz (11.3MB)
Collecting python-dateutil>=2 (from pandas)
Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
Collecting pytz>=2011k (from pandas)
Downloading pytz-2018.3-py2.py3-none-any.whl (509kB)
Collecting numpy>=1.9.0 (from pandas)
Downloading numpy-1.14.1.zip (4.9MB)
Collecting six>=1.5 (from python-dateutil>=2->pandas)
Downloading six-1.11.0-py2.py3-none-any.whl
Building wheels for collected packages: pandas, numpy
Running setup.py bdist_wheel for pandas: started
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: still running...
Running setup.py bdist_wheel for pandas: finished with status 'done'
Stored in directory: /root/.cache/pip/wheels/e8/ed/46/0596b51014f3cc49259e52dff9824e1c6fe352048a2656fc92
Running setup.py bdist_wheel for numpy: started
Running setup.py bdist_wheel for numpy: still running...
Running setup.py bdist_wheel for numpy: still running...
Running setup.py bdist_wheel for numpy: still running...
Running setup.py bdist_wheel for numpy: finished with status 'done'
Stored in directory: /root/.cache/pip/wheels/9d/cd/e1/4d418b16ea662e512349ef193ed9d9ff473af715110798c984
Successfully built pandas numpy
Installing collected packages: six, python-dateutil, pytz, numpy, pandas
Successfully installed numpy-1.14.1 pandas-0.22.0 python-dateutil-2.6.1 pytz-2018.3 six-1.11.0
Removing intermediate container 36f6024e5e2d
---> a93c59e6a106
Successfully built a93c59e6a106
Successfully tagged alpine-pandas:latest
docker build -t alpine-pandas -f Dockerfile.alpine . --no-cache 0.54s user 0.33s system 0% cpu 16:08.47 total
采纳答案by nickgryg
Debian based images use only python pip
to install packages with .whl
format:
基于 Debian 的映像仅python pip
用于安装以下.whl
格式的软件包:
Downloading pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl (26.2MB)
Downloading numpy-1.14.1-cp36-cp36m-manylinux1_x86_64.whl (12.2MB)
WHL format was developed as a quicker and more reliable method of installing Python software than re-building from source code every time. WHL files only have to be moved to the correct location on the target system to be installed, whereas a source distribution requires a build step before installation.
WHL 格式的开发是为了安装 Python 软件,而不是每次都从源代码重新构建。WHL 文件只需移动到要安装的目标系统上的正确位置,而源代码分发在安装之前需要一个构建步骤。
Wheel packages pandas
and numpy
are not supported in images based on Alpine platform. That's why when we install them using python pip
during the building process, we always compile them from the source files in alpine:
轮包pandas
和numpy
基于高山平台,图像不支持。这就是为什么当我们python pip
在构建过程中安装它们时,我们总是从 alpine 的源文件中编译它们:
Downloading pandas-0.22.0.tar.gz (11.3MB)
Downloading numpy-1.14.1.zip (4.9MB)
and we can see the following inside container during the image building:
我们可以在镜像构建过程中看到以下内部容器:
/ # ps aux
PID USER TIME COMMAND
1 root 0:00 /bin/sh -c pip install pandas
7 root 0:04 {pip} /usr/local/bin/python /usr/local/bin/pip install pandas
21 root 0:07 /usr/local/bin/python -c import setuptools, tokenize;__file__='/tmp/pip-build-en29h0ak/pandas/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n
496 root 0:00 sh
660 root 0:00 /bin/sh -c gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DTHREAD_STACK_SIZE=0x100000 -fPIC -Ibuild/src.linux-x86_64-3.6/numpy/core/src/pri
661 root 0:00 gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -DTHREAD_STACK_SIZE=0x100000 -fPIC -Ibuild/src.linux-x86_64-3.6/numpy/core/src/private -Inump
662 root 0:00 /usr/libexec/gcc/x86_64-alpine-linux-musl/6.4.0/cc1 -quiet -I build/src.linux-x86_64-3.6/numpy/core/src/private -I numpy/core/include -I build/src.linux-x86_64-3.6/numpy/core/includ
663 root 0:00 ps aux
If we modify Dockerfile
a little:
如果我们Dockerfile
稍微修改一下:
FROM python:3.6.4-alpine3.7
RUN apk add --no-cache g++ wget
RUN wget https://pypi.python.org/packages/da/c6/0936bc5814b429fddb5d6252566fe73a3e40372e6ceaf87de3dec1326f28/pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
RUN pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
we get the following error:
我们收到以下错误:
Step 4/4 : RUN pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl
---> Running in 0faea63e2bda
pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl is not a supported wheel on this platform.
The command '/bin/sh -c pip install pandas-0.22.0-cp36-cp36m-manylinux1_x86_64.whl' returned a non-zero code: 1
Unfortunately, the only way to install pandas
on an Alpine image is to wait until build finishes.
不幸的是,在pandas
Alpine 映像上安装的唯一方法是等到构建完成。
Of course if you want to use the Alpine image with pandas
in CI for example, the best way to do so is to compile it once, push it to any registry and use it as a base image for your needs.
当然,如果您想pandas
在 CI 中使用 Alpine 映像,最好的方法是编译一次,将其推送到任何注册表并将其用作满足您需求的基本映像。
EDIT:If you want to use the Alpine image with pandas
you can pull my nickgryg/alpine-pandasdocker image. It is a python image with pre-compiled pandas
on the Alpine platform. It should save your time.
编辑:如果你想使用 Alpine 图像,pandas
你可以拉我的nickgryg/alpine-pandas docker图像。它是一个pandas
在 Alpine 平台上预编译的 python 镜像。它应该可以节省您的时间。
回答by jtlz2
ANSWER: AS OF 25/10/2019, FOR PYTHON 3, IT STILL DOESN'T!
答案:截至 25/10/2019,对于 PYTHON 3,它仍然没有!
Here is a complete working Dockerfile:
这是一个完整的工作 Dockerfile:
FROM python:3.7-alpine
RUN echo "@testing http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
RUN apk add --update --no-cache py3-numpy py3-pandas@testing
The build is very sensitive to the exact python and alpine version numbers - getting these wrong seems to provoke Max Levy's error so:libpython3.7m.so.1.0 (missing)
- but the above does now work for me.
构建对确切的 python 和 alpine 版本号非常敏感 - 弄错这些似乎会引发 Max Levy 的错误so:libpython3.7m.so.1.0 (missing)
- 但上面的内容现在对我有用。
My updated Dockerfile is available at https://gist.github.com/jtlz2/b0f4bc07ce2ff04bc193337f2327c13b
我更新的 Dockerfile 可在https://gist.github.com/jtlz2/b0f4bc07ce2ff04bc193337f2327c13b 获得
[Earlier Update:]
[较早的更新:]
ANSWER: IT DOESN'T!
答案:没有!
In any Alpine Dockerfile you can simply do*
在任何 Alpine Dockerfile 中,您都可以简单地做*
RUN apk add py2-numpy@community py2-scipy@community py-pandas@edge
This is because numpy
, scipy
and now pandas
are all available prebuilt on alpine
:
这是因为numpy
,scipy
现在pandas
都可以预构建在alpine
:
https://pkgs.alpinelinux.org/packages?name=*numpy
https://pkgs.alpinelinux.org/packages?name=*numpy
https://pkgs.alpinelinux.org/packages?name=*scipy&branch=edge
https://pkgs.alpinelinux.org/packages?name=*scipy&branch=edge
https://pkgs.alpinelinux.org/packages?name=*pandas&branch=edge
https://pkgs.alpinelinux.org/packages?name=*pandas&branch=edge
One way to avoid rebuilding every time, or using a Docker layer, is to use a prebuilt, native Alpine Linux/.apk
package, e.g.
避免每次重建或使用 Docker 层的一种方法是使用预先构建的原生 Alpine Linux/.apk
包,例如
https://github.com/sgerrand/alpine-pkg-py-pandas
https://github.com/sgerrand/alpine-pkg-py-pandas
https://github.com/nbgallery/apks
https://github.com/nbgallery/apks
You can build these .apk
s once and use them wherever in your Dockerfile you like :)
您可以构建这些.apk
s 并在您喜欢的 Dockerfile 中的任何位置使用它们:)
This also saves you having to bake everything else into the Docker image before the fact - i.e. the flexibility to pre-build any Docker image you like.
这也使您不必在事前将其他所有内容烘焙到 Docker 映像中 - 即预先构建您喜欢的任何 Docker 映像的灵活性。
PS I have put a Dockerfile stub at https://gist.github.com/jtlz2/b0f4bc07ce2ff04bc193337f2327c13bthat shows roughly how to build the image. These include the important steps (*):
PS 我在https://gist.github.com/jtlz2/b0f4bc07ce2ff04bc193337f2327c13b放了一个 Dockerfile 存根,它大致显示了如何构建映像。其中包括重要步骤 (*):
RUN echo "@community http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories
RUN apk update
RUN apk add --update --no-cache libgfortran
回答by stefanitsky
ATTENTION
Look at the @jtlz2 answer with the latest update
注意
查看最新更新的@jtlz2 答案
OUTDATED
过时
So, py3-pandas & py3-numpy packages moved to the testing alpine repository, so, you can download it by adding these lines in to the your Dockerfile:
因此,py3-pandas 和 py3-numpy 包已移至测试 alpine 存储库,因此,您可以通过将这些行添加到 Dockerfile 中来下载它:
RUN echo "http://dl-8.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories \
&& apk update \
&& apk add py3-numpy py3-pandas
Hope it helps someone!
Alpine packages links:
- py3-pandas
- py3-numpyAlpine repositories docks info.
希望它可以帮助某人!
Alpine 软件包链接:
- py3-pandas
- py3-numpyAlpine 存储库停靠信息。
回答by ThisGuyCantEven
Just going to bring some of these answers together in one answer and add a detail I think was missed. The reason certain python libraries, particularly optimized math and data libraries, take so long to build on alpine is because the pip wheels for these libraries include binaries precompiled from c/c++ and linked against glibc
, a common set of c standard libraries. Debian, Fedora, CentOS all (typically) use glibc
, but alpine, in order to stay lightweight, uses musl-libc
instead. c/c++ binaries build on a glibc
system will not work on a system without glibc
and the same goes for musl
.
只是将这些答案中的一些汇总到一个答案中,并添加我认为遗漏的细节。某些 python 库,特别是优化的数学和数据库,在 alpine 上构建需要很长时间的原因是这些库的 pip 轮包括从 c/c++ 预编译并链接到glibc
一组通用 c 标准库的二进制文件。Debian、Fedora、CentOS 都(通常)使用glibc
,但 alpine 为了保持轻量级,musl-libc
而是使用。C / C ++二进制文件上构建glibc
系统没有一个系统上运行glibc
和同样的musl
。
Pip looks first for a wheel with the correct binaries, if it can't find one, it tries to compile the binaries from the c/c++ source and links them against musl. In many cases, this won't even work unless you have the python headers from python3-dev
or build tools like make
.
Pip 首先查找具有正确二进制文件的轮子,如果找不到,它会尝试从 c/c++ 源代码编译二进制文件并将它们链接到 musl。在许多情况下,除非您拥有 python 头文件python3-dev
或构建工具(如make
.
Now the silver lining, as others have mentioned, there are apk
packages with the proper binaries provided by the community, using these will save you the (sometimes lengthy) process of building the binaries.
现在的一线希望,正如其他人所提到的,apk
社区提供了带有适当二进制文件的软件包,使用这些可以为您节省构建二进制文件的(有时是冗长的)过程。
回答by Flávio Henrique
Real honest advice here, switch to Debian based image and then all your problems will be gone.
真正诚实的建议,切换到基于 Debian 的镜像,然后你所有的问题都会消失。
Alpine for python applications doesn't work well.
Alpine for python 应用程序运行不佳。
Here is an example of my dockerfile
:
这是我的一个例子dockerfile
:
FROM python:3.7.6-buster
RUN pip install pandas==1.0.0
RUN pip install sklearn
RUN pip install Django==3.0.2
RUN pip install cx_Oracle==7.3.0
RUN pip install excel
RUN pip install djangorestframework==3.11.0
The python:3.7.6-buster
is more appropriate in this case, in addition, you don't need any extra dependency in the OS.
该python:3.7.6-buster
是在这种情况下更合适,此外,您不需要在OS任何额外的依赖。
Follow a usefull and recent article: https://pythonspeed.com/articles/alpine-docker-python/:
关注一篇有用的最新文章:https: //pythonspeed.com/articles/alpine-docker-python/ :
Don't use Alpine Linux for Python images Unless you want massively slower build times, larger images, more work, and the potential for obscure bugs, you'll want to avoid Alpine Linux as a base image. For some recommendations on what you should use, see my article on choosing a good base image.
不要将 Alpine Linux 用于 Python 镜像除非您想要大幅减慢构建时间、更大的镜像、更多的工作以及潜在的隐蔽错误,否则您将希望避免将 Alpine Linux 作为基础镜像。有关您应该使用什么的一些建议,请参阅我关于选择良好基础映像的文章。