在 docker Alpine 中安装 Pandas

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/54890328/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 06:20:17  来源:igfitidea点击:

Installing pandas in docker Alpine

pythonpandasnumpydockeralpine

提问by 8-Bit Borges

I am having a reallyhard time trying to install a stable data science package configuration in docker. This should be easier with such mainstream, relevant tools.

我有一个真的很难试图安装在一个稳定的数据包的科学配置docker。使用这些主流的相关工具应该会更容易。

The following is the Dockerfilethat usedto work, with a bit of a hack, removing pandasfrom the package core and installing it separately, specifying pandas<0.21.0, because, allegedly, higher versions conflict with numpy.

以下是曾经可以使用Dockerfile稍加修改pandas从包核心中删除并单独安装,指定pandas<0.21.0,因为据称更高版本与numpy.

    FROM alpine:3.6

    ENV PACKAGES="\
    dumb-init \
    musl \
    libc6-compat \
    linux-headers \
    build-base \
    bash \
    git \
    ca-certificates \
    freetype \
    libgfortran \
    libgcc \
    libstdc++ \
    openblas \
    tcl \
    tk \
    libssl1.0 \
    "

ENV PYTHON_PACKAGES="\
    numpy \
    matplotlib \
    scipy \
    scikit-learn \
    nltk \
    " 

RUN apk add --no-cache --virtual build-dependencies python3 \
    && apk add --virtual build-runtime \
    build-base python3-dev openblas-dev freetype-dev pkgconfig gfortran \
    && ln -s /usr/include/locale.h /usr/include/xlocale.h \
    && python3 -m ensurepip \
    && rm -r /usr/lib/python*/ensurepip \
    && pip3 install --upgrade pip setuptools \
    && ln -sf /usr/bin/python3 /usr/bin/python \
    && ln -sf pip3 /usr/bin/pip \
    && rm -r /root/.cache \
    && pip install --no-cache-dir $PYTHON_PACKAGES \
    && pip3 install 'pandas<0.21.0' \    #<---------- PANDAS
    && apk del build-runtime \
    && apk add --no-cache --virtual build-dependencies $PACKAGES \
    && rm -rf /var/cache/apk/*

# set working directory
WORKDIR /usr/src/app

# add and install requirements
COPY ./requirements.txt /usr/src/app/requirements.txt # other than data science packages go here
RUN pip install -r requirements.txt

# add entrypoint.sh
COPY ./entrypoint.sh /usr/src/app/entrypoint.sh

RUN chmod +x /usr/src/app/entrypoint.sh

# add app
COPY . /usr/src/app

# run server
CMD ["/usr/src/app/entrypoint.sh"]


The configuration above used to work. What happens nowis that build does go through, but pandasfails at importwith the following error:

上面的配置用来工作。现在发生的是构建确实通过了,但在导入pandas失败并出现以下错误:

ImportError: Missing required dependencies ['numpy']

Since numpy 1.16.1was installed, I don't know which numpypandasis trying to find anymore...

自从numpy 1.16.1安装后,我不知道哪个numpypandas正在尝试查找......

Does anyone know how to obtain a stable solution for this?

有谁知道如何为此获得稳定的解决方案?

NOTE: A solution consisting of a pull from a turnkey dockerimage for data science with at least the packages mentioned above, into Dockerfileabove, would be also very welcomed.

注意:从docker数据科学的交钥匙图像中提取至少上述包的解决方案Dockerfile,也将非常受欢迎。



EDIT 1:

编辑 1

If I move install of data packages into requirements.txt, as suggested in the comments, like so:

如果我requirements.txt按照评论中的建议将数据包的安装移动到 中,如下所示:

requirements.txt

要求.txt

(...)
numpy==1.16.1 # or numpy==1.16.0
scikit-learn==0.20.2
scipy==1.2.1
nltk==3.4   
pandas==0.24.1 # or pandas== 0.23.4
matplotlib==3.0.2 
(...)

and Dockerfile:

Dockerfile

# add and install requirements
COPY ./requirements.txt /usr/src/app/requirements.txt
RUN pip install -r requirements.txt

It breaks again at pandas, complaining about numpy.

它又断了pandas,抱怨着numpy

Collecting numpy==1.16.1 (from -r requirements.txt (line 61))
  Downloading https://files.pythonhosted.org/packages/2b/26/07472b0de91851b6656cbc86e2f0d5d3a3128e7580f23295ef58b6862d6c/numpy-1.16.1.zip (5.1MB)
Collecting scikit-learn==0.20.2 (from -r requirements.txt (line 62))
  Downloading https://files.pythonhosted.org/packages/49/0e/8312ac2d7f38537361b943c8cde4b16dadcc9389760bb855323b67bac091/scikit-learn-0.20.2.tar.gz (10.3MB)
Collecting scipy==1.2.1 (from -r requirements.txt (line 63))
  Downloading https://files.pythonhosted.org/packages/a9/b4/5598a706697d1e2929eaf7fe68898ef4bea76e4950b9efbe1ef396b8813a/scipy-1.2.1.tar.gz (23.1MB)
Collecting nltk==3.4 (from -r requirements.txt (line 64))
  Downloading https://files.pythonhosted.org/packages/6f/ed/9c755d357d33bc1931e157f537721efb5b88d2c583fe593cc09603076cc3/nltk-3.4.zip (1.4MB)
Collecting pandas==0.24.1 (from -r requirements.txt (line 65))
  Downloading https://files.pythonhosted.org/packages/81/fd/b1f17f7dc914047cd1df9d6813b944ee446973baafe8106e4458bfb68884/pandas-0.24.1.tar.gz (11.8MB)
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 359, in get_provider
        module = sys.modules[moduleOrReq]
    KeyError: 'numpy'

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-_e5z6o6_/pandas/setup.py", line 732, in <module>
        ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
      File "/tmp/pip-install-_e5z6o6_/pandas/setup.py", line 475, in maybe_cythonize
        numpy_incl = pkg_resources.resource_filename('numpy', 'core/include')
      File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 1144, in resource_filename
        return get_provider(package_or_requirement).get_resource_filename(
      File "/usr/local/lib/python3.7/site-packages/pkg_resources/__init__.py", line 361, in get_provider
        __import__(moduleOrReq)
    ModuleNotFoundError: No module named 'numpy'

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-_e5z6o6_/pandas/


EDIT 2:

编辑 2

This seems like an open pandasissue. For more details please refer to:

这似乎是一个悬而未决的pandas问题。更多详情请参考:

pandas-dev github

pandas-dev github

"Unfortunately, this means that a requirements.txt file is insufficient for setting up a new environment with pandas installed (like in a docker container)".

“不幸的是,这意味着requirements.txt 文件不足以设置安装了pandas 的新环境(例如在docker 容器中)”。

  **ImportError**:

  IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

  Importing the multiarray numpy extension module failed.  Most
  likely you are trying to import a failed build of numpy.
  Here is how to proceed:
  - If you're working with a numpy git repository, try `git clean -xdf`
    (removes all files not under version control) and rebuild numpy.
  - If you are simply trying to use the numpy version that you have installed:
    your installation is broken - please reinstall numpy.
  - If you have already reinstalled and that did not fix the problem, then:
    1. Check that you are using the Python you expect (you're using /usr/local/bin/python),
       and that you have no directories in your PATH or PYTHONPATH that can
       interfere with the Python and numpy versions you're trying to use.
    2. If (1) looks fine, you can open a new issue at
       https://github.com/numpy/numpy/issues.  Please include details on:
       - how you installed Python
       - how you installed numpy
       - your operating system
       - whether or not you have multiple versions of Python installed
       - if you built from source, your compiler versions and ideally a build log


EDIT 3

编辑 3

requirements.txt---> https://pastebin.com/0icnx0iu

requirements.txt---> https://pastebin.com/0icnx0iu



EDIT 4

编辑 4

As of 01/12/20, the accepted solution started not to work anymore.Now, build breaks not at pandas, but at scipybut after numpy, while building scipy'swheel. This is the log:

截至 20 年 1 月 12 日,已接受的解决方案开始不再起作用。现在,生成中断没有pandas,但scipy但经过numpy,同时建立scipy's轮。这是日志:

  ----------------------------------------
  ERROR: Failed building wheel for scipy
  Running setup.py clean for scipy
  ERROR: Command errored out with exit status 1:
   command: /usr/bin/python3.6 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-s6nahssd/scipy/setup.py'"'"'; __file__='"'"'/tmp/pip-install-s6nahssd/scipy/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' clean --all
       cwd: /tmp/pip-install-s6nahssd/scipy
  Complete output (9 lines):

  `setup.py clean` is not supported, use one of the following instead:

    - `git clean -xdf` (cleans all files)
    - `git clean -Xdf` (cleans all versioned files, doesn't touch
                        files that aren't checked into the git repo)

  Add `--force` to your command to use it anyway if you must (unsupported).

  ----------------------------------------
  ERROR: Failed cleaning build dir for scipy
Successfully built numpy
Failed to build scipy
ERROR: Could not build wheels for scipy which use PEP 517 and cannot be installed directly

From the error it seems that building process is using python3.6, while I use FROM alpine:3.7.

从错误看来,构建过程正在使用python3.6,而我使用FROM alpine:3.7.

Full log here -> https://pastebin.com/Tw4ubxSA

完整日志在这里-> https://pastebin.com/Tw4ubxSA

And this is the current Dockerfile:

这是当前的 Dockerfile:

https://pastebin.com/3SftEufx

https://pastebin.com/3SftEufx

采纳答案by valiano

If you're not bound to Alpine 3.6, using Alpine 3.7 (or later) should work.

如果您未绑定到 Alpine 3.6,则使用 Alpine 3.7(或更高版本)应该可以工作。

On Alpine 3.6, installing matplotlibfailed for me with the following:

在 Alpine 3.6 上,matplotlib我的安装失败,原因如下:

Collecting matplotlib
  Downloading https://files.pythonhosted.org/packages/26/04/8b381d5b166508cc258632b225adbafec49bbe69aa9a4fa1f1b461428313/matplotlib-3.0.3.tar.gz (36.6MB)
    Complete output from command python setup.py egg_info:
    Download error on https://pypi.org/simple/numpy/: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833) -- Some packages may not be found!
    Couldn't find index page for 'numpy' (maybe misspelled?)
    Download error on https://pypi.org/simple/: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:833) -- Some packages may not be found!
    No local packages or working download links found for numpy>=1.10.0

However, on Alpine 3.7, it worked. This may be due to a numpyversioning issue (see here), but I'm not able to tell for sure. Past that problem, packages were built and installed successfully - taking a good while, about 30 minutes (since Alpine's musl-libc is not compatible to Python's Wheels format, all packages installed with pip have to be built from source).

但是,在 Alpine 3.7 上,它起作用了。这可能是由于numpy版本控制问题(请参阅此处),但我无法确定。解决了这个问题,成功构建并安装了包 - 花了很长时间,大约 30 分钟(由于 Alpine 的 musl-libc 与 Python 的 Wheels 格式不兼容,所有使用 pip 安装的包都必须从源代码构建)。

Note that one important change is needed: you should only remove the build-runtimevirtual package (apk del build-runtime) after pip install. Also, if applicable, you could replace numpy 1.16.1with 1.16.2, which is the shipped version (otherwise 1.16.2will be uninstalled and 1.16.1built from source, further increasing the build time) - I haven't tried this, though.

需要注意的是,需要一个重要的变化:你只应删除build-runtime(虚拟包apk del build-runtime)之后pip install。此外,如果适用,您可以将 numpy 替换1.16.11.16.2,这是附带的版本(否则1.16.2将被卸载并1.16.1从源代码构建,进一步增加构建时间) - 不过我还没有尝试过。

For reference, here's my slightly modified Dockerfileand docker build output.

作为参考,这是我稍微修改的Dockerfiledocker build output

Note:

笔记:

Usually Alpine is chosen as the base for minimizing the image size (Alpine is also otherwise very slick, but has compatibility issues with mainland Linux apps due to glibc/musl). Having to build Python packages from source kind of beats that purpose, since you get a very bloated image - 900MB before any cleanup, which also takes ages to build. The image could be greatly compacted by removing all intermediate compilation artifacts, build dependencies etc., but still.

通常选择 Alpine 作为最小化图像大小的基础(Alpine 也非常灵活,但由于 glibc/musl 与大陆 Linux 应用程序存在兼容性问题)。必须从源代码构建 Python 包比这个目的要好,因为你得到一个非常臃肿的图像 - 在任何清理之前 900MB,这也需要很长时间来构建。通过删除所有中间编译工件、构建依赖项等,可以极大地压缩映像,但仍然如此。

If you can't get the Python package versions you need to work on Alpine, without having to build them from source, I would suggest trying other small and more compatible base images such as debian-slim, or even ubuntu.

如果您无法获得在 Alpine 上工作所需的 Python 包版本,而不必从源代码构建它们,我建议您尝试其他小型且更兼容的基础映像,例如debian-slim, 甚至ubuntu.

Edit:

编辑:

Following "Edit 3" with added requirements, here are updated Dockerfileand Docker build output. The following packages were added for satisfying build dependencies:

在添加了要求的“编辑 3”之后,这里是更新的Dockerfile和 Docker构建输出。添加了以下包以满足构建依赖项:

postgresql-dev libffi-dev libressl-dev libxml2 libxml2-dev libxslt libxslt-dev libjpeg-turbo-dev zlib-dev

For packages that failed to build due to specific headers, I used Alpine's package contents search to locate the missing package. Specifically for cffi, the ffi.hheader was missing, which needs the libffi-devpackage: https://pkgs.alpinelinux.org/contents?file=ffi.h&path=&name=&branch=v3.7.

对于由于特定标头而无法构建的包,我使用 Alpine 的包内容搜索来定位丢失的包。特别是cffiffi.h标题丢失,需要libffi-dev包:https://pkgs.alpinelinux.org/contents?file=ffi.h&path=&name=&branch=v3.7

Alternatively, when a package build failure is not very clear, the installation instructions of the specific package could be referred to, for example, Pillow.

或者,当包构建失败不是很清楚时,可以参考特定包的安装说明,例如Pillow

The new image size, before any compaction, is 1.04GB. For cutting it down a bit, you could remove the Python and pip caches:

压缩前的新图像大小为 1.04GB。为了减少一点,您可以删除 Python 和 pip 缓存:

RUN apk del build-runtime && \
    find -type d -name __pycache__ -prune -exec rm -rf {} \; && \
    rm -rf ~/.cache/pip

This will bring image size down to 661MB, when using docker build --squash.

当使用docker build --squash.

回答by Ram Krishnan

Try adding this to your requirements.txt file:

尝试将其添加到您的 requirements.txt 文件中:

numpy==1.16.0
pandas==0.23.4

I've been facing the same error since yesterday and this change solved it for me.

自昨天以来,我一直面临同样的错误,此更改为我解决了这个问题。

回答by jtlz2

An older Q&A at Why does it take ages to install Pandas on Alpine Linuxrelates.

为什么在 Alpine Linux 上安装 Pandas 需要很长时间的一个较早的问答与此相关。

If your aim to get a stable solution without knowing the nuts and bolts, for python 3 you can just build off the following (copy & paste of my answer from https://stackoverflow.com/a/50443531/1021819)

如果您的目标是在不了解具体细节的情况下获得稳定的解决方案,对于 python 3,您可以构建以下内容(从https://stackoverflow.com/a/50443531/1021819复制并粘贴我的答案)

FROM python:3.7-alpine
RUN echo "@testing http://dl-cdn.alpinelinux.org/alpine/edge/testing" >> /etc/apk/repositories
RUN apk add --update --no-cache py3-numpy py3-pandas@testing

If your goal is to understand howto achieve a stable build, the discussion there and related images mighthelp too...

如果您的目标是了解如何实现稳定构建,那么那里的讨论和相关图片也可能有所帮助...

回答by codeslord

This may not be completely relevant, since this the first answer that pops up when searching for numpy/pandas installation failed in Alpine, I am adding this answer.

这可能并不完全相关,因为这是在 Alpine 中搜索 numpy/pandas 安装失败时弹出的第一个答案,我正在添加这个答案。

The following fix worked for me(But it takes longer to install pandas/numpy)

以下修复对我有用(但安装pandas/numpy需要更长的时间)

apk update
apk --no-cache add curl gcc g++
ln -s /usr/include/locale.h /usr/include/xlocale.h

Now try installing pandas/numpy

现在尝试安装 pandas/numpy