AWS lambda 中的 Pandas 给出了 numpy 错误
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43877692/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas in AWS lambda gives numpy error
提问by Kingz
I've been trying to run my code in AWS Lambda which imports pandas. So here is what I've done. I have a python file which contains a simple code as follows(This file has the lambda handler)
我一直在尝试在导入Pandas的 AWS Lambda 中运行我的代码。所以这就是我所做的。我有一个 python 文件,它包含一个简单的代码如下(这个文件有 lambda 处理程序)
import json
print('Loading function')
import pandas as pd
def lambda_handler(event, context):
return "Welcome to Pandas usage in AWS Lambda"
- I have zipped this python file along with numpy, pandas and pytz libraries as a deployment package (Did all these in Amazon EC2 linux machine)
- Then uploaded the package into S3
- Created a lambda function(runtime=python3.6) and uploaded the deployment package from S3
- 我已经将这个 python 文件与 numpy、pandas 和 pytz 库一起压缩为部署包(所有这些都在 Amazon EC2 linux 机器中完成)
- 然后将包上传到S3
- 创建了一个 lambda 函数(runtime=python3.6)并从 S3 上传部署包
But when I test the lambda function in AWS Lambda, I get the below error:
但是当我在 AWS Lambda 中测试 lambda 函数时,出现以下错误:
Unable to import module 'lambda_function': Missing required dependencies ['numpy']
I already have numpy in the zipped package but still I get this error. I tried to follow the hints given at Pandas & AWS Lambdabut no luck.
我已经在压缩包中有 numpy,但仍然出现此错误。我尝试按照Pandas & AWS Lambda给出的提示进行操作,但没有成功。
Did anyone ran into the same issue. Would appreciate any hint or suggestions to solve this problem.
有没有人遇到同样的问题。将不胜感激任何提示或建议来解决这个问题。
Thanks
谢谢
回答by pbegle
EDIT: I figured out finally how to run pandas & numpy in a AWS Lambda python 3.6 runtime environment.
编辑:我终于想出了如何在 AWS Lambda python 3.6 运行时环境中运行 pandas 和 numpy。
I have uploaded my deployment package to the following repo:
我已将部署包上传到以下存储库:
git clone https://github.com/pbegle/aws-lambda-py3.6-pandas-numpy.git
git clone https://github.com/pbegle/aws-lambda-py3.6-pandas-numpy.git
Simply add your lambda_function.py
to the zip file by running:
只需lambda_function.py
通过运行将您的添加到 zip 文件:
zip -ur lambda.zip lambda_function.py
zip -ur lambda.zip lambda_function.py
Upload to S3 and source to lambda.
上传到 S3 和源到 lambda。
ORIGINAL:
原来的:
The only way I have gotten Pandas to work in a lambda function is by compiling the pandas (and numpy) libraries in an AWS Linux EC2 instance following the steps from this blog postand then using the python 2.7 runtime for my lambda function.
我让 Pandas 在 lambda 函数中工作的唯一方法是按照这篇博文中的步骤在 AWS Linux EC2 实例中编译 pandas(和 numpy)库,然后将 python 2.7 运行时用于我的 lambda 函数。
回答by Ranadeep Guha
After doing a lot of research I was able to make it work with Lambda layers.
经过大量研究后,我能够使其与 Lambda 层一起使用。
Create or open a clean directory and follow the steps below:
创建或打开一个干净的目录并按照以下步骤操作:
Prerequisites: Make sure you have Docker up and running
先决条件:确保您已启动并运行 Docker
- Create a requirements.txtfile with the following:
- 使用以下内容创建一个requirements.txt文件:
pandas==0.23.4 pytz==2018.7
pandas==0.23.4 pytz==2018.7
- Create a get_layer_packages.shfile with the following:
- 使用以下内容创建get_layer_packages.sh文件:
#!/bin/bash export PKG_DIR="python" rm -rf ${PKG_DIR} && mkdir -p ${PKG_DIR} docker run --rm -v $(pwd):/foo -w /foo lambci/lambda:build-python3.6 \ pip install -r requirements.txt --no-deps -t ${PKG_DIR}
#!/bin/bash export PKG_DIR="python" rm -rf ${PKG_DIR} && mkdir -p ${PKG_DIR} docker run --rm -v $(pwd):/foo -w /foo lambci/lambda:build-python3.6 \ pip install -r requirements.txt --no-deps -t ${PKG_DIR}
- Run the following commands in the same directory:
- 在同一目录中运行以下命令:
chmod +x get_layer_packages.sh ./get_layer_packages.sh zip -r pandas.zip .
chmod +x get_layer_packages.sh ./get_layer_packages.sh zip -r pandas.zip .
Upload the layer to a S3 bucket.
Upload the layer to AWS by running the command below:
将图层上传到 S3 存储桶。
通过运行以下命令将层上传到 AWS:
aws lambda publish-layer-version --layer-name pandas-layer --description "Description of your layer" --content S3Bucket=<bucket name>,S3Key=<layer-name>.zip --compatible-runtimes python3.6 python3.7
aws lambda publish-layer-version --layer-name pandas-layer --description "Description of your layer" --content S3Bucket=<bucket name>,S3Key=<layer-name>.zip --compatible-runtimes python3.6 python3.7
Go to Lambda console and upload your code as a zip file or use the inline editor.
Click on Layers > Add a layer> Search for the layer (pandas-layer) from the Compatible layers and select the version.
Also add the AWSLambda-Python36-SciPy1x layer which is available by default for importing numpy.
转到 Lambda 控制台并将您的代码作为 zip 文件上传或使用内联编辑器。
单击图层 > 添加图层 > 从兼容图层中搜索图层 (pandas-layer) 并选择版本。
还要添加 AWSLambda-Python36-SciPy1x 层,默认情况下该层可用于导入 numpy。
Selecting the layer from the console
- Test the code. It should work now!!!!
- 测试代码。现在应该可以用了!!!!
Thanks to this medium article https://medium.com/@qtangs/creating-new-aws-lambda-layer-for-python-pandas-library-348b126e9f3e
感谢这篇中等文章https://medium.com/@qtangs/creating-new-aws-lambda-layer-for-python-pandas-library-348b126e9f3e
回答by chim
To include numpy in your lambda zip follow the instructions on this page in the AWS docs...
要在您的 lambda zip 中包含 numpy,请按照 AWS 文档中此页面上的说明进行操作...
如何将带有已编译二进制文件的 Python 包添加到我的部署包并使该包与 AWS Lambda 兼容?
To paraphrase the instructions using numpy as an example:
以 numpy 为例来解释说明:
- Open the module pages at pypi.org. https://pypi.org/project/numpy/
Choose Download files.
Download:
- 在 pypi.org 打开模块页面。 https://pypi.org/project/numpy/
选择下载文件。
下载:
For Python 2.7, module-name-version-cp27-cp27mu-manylinux1_x86_64.whl
对于 Python 2.7,module-name-version-cp27-cp27mu-manylinux1_x86_64.whl
e.g. numpy-1.15.2-cp27-cp27m-manylinux1_x86_64.whl
例如 numpy-1.15.2-cp27-cp27m-manylinux1_x86_64.whl
For Python 3.6, module-name-version-cp36-cp36m-manylinux1_x86_64.whl
对于 Python 3.6,模块名称-版本-cp36-cp36m-manylinux1_x86_64.whl
e.g. numpy-1.15.2-cp36-cp36m-manylinux1_x86_64.whl
例如 numpy-1.15.2-cp36-cp36m-manylinux1_x86_64.whl
- Uncompress the wheel file on the /path/to/project-dir folder. You can use the unzip command on the command line to do this. There are other ways obviously.
- 解压缩 /path/to/project-dir 文件夹中的轮文件。您可以在命令行上使用 unzip 命令来执行此操作。显然还有其他方法。
unzip numpy-1.15.2-cp36-cp36m-manylinux1_x86_64.whl
unzip numpy-1.15.2-cp36-cp36m-manylinux1_x86_64.whl
When the wheel file is uncompressed, your deployment package will be compatible with Lambda.
当wheel文件解压缩时,您的部署包将与Lambda兼容。
Hope that all makes sense ;)
希望一切都有意义;)
The end result might look something like this. Note: you should not include the whl file in the deployment package.
最终结果可能看起来像这样。注意:您不应在部署包中包含 whl 文件。
回答by korniichuk
AWS Lambda use Amazon Linuxoperating system. Idea is download Pandasand NumPycompatible with Amazon Linux. What you download using pip
is specific to Windows or Mac. You need to download the compatible version for Linux, so that your Lambda function can understand it. These files are called wheel
files.
AWS Lambda 使用 Amazon Linux操作系统。想法是下载与Amazon Linux兼容的Pandas和NumPy。您下载的内容特定于 Windows 或 Mac。您需要下载适用于 Linux 的兼容版本,以便您的 Lambda 函数可以理解它。这些文件称为文件。pip
wheel
Create new local directory with lambda_function.py
file. Install Pandas to local directory with pip:
用lambda_function.py
文件创建新的本地目录。使用 pip 将 Pandas 安装到本地目录:
$ pip install -t . pandas
Navigate to https://pypi.org/project/pandas/#files. Search for and download newest *manylinux1_x86_64.whl
package. In my case, I'm using Python 3.6 on my Lambda function, so I downloaded the following:
导航到https://pypi.org/project/pandas/#files。搜索并下载最新的*manylinux1_x86_64.whl
软件包。就我而言,我在 Lambda 函数上使用 Python 3.6,因此我下载了以下内容:
Download whl files to directory with lambda_function.py
. Remove pandas
, numpy
, and *.dist-info
directories. Unzip whl files.
将 whl 文件下载到lambda_function.py
. 删除pandas
,numpy
和*.dist-info
目录。解压缩 whl 文件。
$ rm -r pandas numpy *.dist-info
$ unzip numpy-1.16.1-cp36-cp36m-manylinux1_x86_64.whl
$ unzip pandas-0.24.1-cp36-cp36m-manylinux1_x86_64.whl
Remove whl files, *.dist-info
, and __pycache__
. Prepare zip.zip
archive:
删除 whl 文件*.dist-info
、 和__pycache__
. 准备zip.zip
存档:
$ rm -r *.whl *.dist-info __pycache__
$ zip -r zip.zip .
Upload the zip.zip
file in your Lambda function.
zip.zip
在您的 Lambda 函数中上传文件。
Source: https://medium.com/@korniichuk/lambda-with-pandas-fd81aa2ff25e
来源:https: //medium.com/@korniichuk/lambda-with-pandas-fd81aa2ff25e
回答by Pierre-Antoine
To get additional libraries in Lambda we need to compile them on Amazon Linux (this is important if the underlying library is based on C or C++ like for Numpy) and package them in a ZIP file together with the python script you want to run in Lambda.
为了在 Lambda 中获得额外的库,我们需要在 Amazon Linux 上编译它们(如果底层库基于 C 或 C++ 就像 Numpy,这很重要)并将它们与要在 Lambda 中运行的 python 脚本一起打包在一个 ZIP 文件中.
To get the Amazon Linux compiled version of the libraries. You can either find a version that someone already compiled, like the one by @pbegle, or compile it yourself. To compile it ourself there are two options: - compile the libraries on an EC2 instance https://streetdatascience.com/2016/11/24/using-numpy-and-pandas-on-aws-lambda/- compile the libraries on a docker version of Lambda environment https://serverlesscode.com/post/scikitlearn-with-amazon-linux-container/
获取库的 Amazon Linux 编译版本。您可以找到某人已经编译的版本,例如@pbegle 的版本,也可以自己编译。要自己编译它,有两个选项: - 在 EC2 实例上 编译库https://streetdatascience.com/2016/11/24/using-numpy-and-pandas-on-aws-lambda/- 在Lambda 环境的 docker 版本 https://serverlesscode.com/post/scikitlearn-with-amazon-linux-container/
Following the last option with Docker, it is possible to make it work using the instructions in the blog post above and by adding:
按照 Docker 的最后一个选项,可以使用上面博客文章中的说明并添加以下内容使其工作:
pip install --use-wheel pandas
in the script to compile the libraries:
在脚本中编译库:
https://github.com/ryansb/sklearn-build-lambda/blob/master/build.sh#L21
https://github.com/ryansb/sklearn-build-lambda/blob/master/build.sh#L21
回答by Dishant Kapadiya
Slightly duplicate of Cannot find MySQL in NodeJS using AWS Lambda
使用 AWS Lambda 在 NodeJS 中找不到 MySQL 的轻微重复
You need to package your libraries with Lambda. As lambda runs on a public cloud, you cannot configure it.
您需要使用 Lambda 打包您的库。由于 lambda 在公共云上运行,因此您无法对其进行配置。
Now in your case, as you are using pandas, you need to package Pandas with your zip. Get a path to pandas(for example: /Users/dummyUser/anaconda/lib/python3.6/site-packages) and copy the library to the place where you have your lambda function code. Inside your code, refer to pandas from your local copy. While uploading, zip the whole set(code + libraries), and upload as you will. It should work.
现在在您的情况下,当您使用 Pandas 时,您需要使用 zip 打包 Pandas。获取 pandas 的路径(例如:/Users/dummyUser/anaconda/lib/python3.6/site-packages)并将库复制到您拥有 lambda 函数代码的位置。在您的代码中,从您的本地副本中引用 pandas。上传时,压缩整个集(代码+库),然后上传。它应该工作。
回答by JeyJ
I tried some of the solution here but most of them didnt work. I liked the idea @Ranadeep Guha suggested of creating a contaienr and downloading the repos over there so thats what I did.
我在这里尝试了一些解决方案,但大多数都不起作用。我喜欢@Ranadeep Guha 建议创建一个容器并在那里下载存储库的想法,这就是我所做的。
I worked in the dir my lambda function was located on and created the following files :
我在我的 lambda 函数所在的目录中工作并创建了以下文件:
Dockerfile :
Dockerfile :
FROM python:3.8-slim
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt -t /app
requirements.txt : (those were mine)
requirements.txt : (那些是我的)
pandas
numpy
xmltodict
now in my gitbash I run the following command that will generate a docker image for me with all the dependecies installed : docker build -t image_name .
现在在我的 gitbash 中,我运行以下命令,该命令将为我生成一个安装了所有依赖项的 docker 镜像:docker build -t image_name。
Sending build context to Docker daemon 5.632kB
Step 1/4 : FROM python:3.8-slim
---> 56930ef6f6a2
Step 2/4 : WORKDIR /app
---> Using cache
---> ea0bf539bcad
Step 3/4 : COPY requirements.txt ./
---> cb4c005f53cc
Step 4/4 : RUN pip install --no-cache-dir -r requirements.txt -t /app
---> Running in a0d179a372b4
Collecting pandas
Downloading pandas-1.0.3-cp38-cp38-manylinux1_x86_64.whl (10.0 MB)
Collecting numpy
Downloading numpy-1.18.3-cp38-cp38-manylinux1_x86_64.whl (20.6 MB)
Collecting xmltodict
Downloading xmltodict-0.12.0-py2.py3-none-any.whl (9.2 kB)
Collecting pytz>=2017.2
Downloading pytz-2020.1-py2.py3-none-any.whl (510 kB)
Collecting python-dateutil>=2.6.1
Downloading python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
Collecting six>=1.5
Downloading six-1.14.0-py2.py3-none-any.whl (10 kB)
Installing collected packages: numpy, pytz, six, python-dateutil, pandas, xmltodict
Successfully installed numpy-1.18.3 pandas-1.0.3 python-dateutil-2.8.1 pytz-2020.1 six-1.14.0 xmltodict-0.12.0
Now just create a docker container of that image and confirm that everything installed :
现在只需创建该图像的 docker 容器并确认已安装所有内容:
winpty docker run --name container_name -it --entrypoint bash image_name
type ls and you will see all the installs..
输入ls,你会看到所有的安装..
Now lets copy all the installs to your local pc : You can replace the dot with any location on your pc
现在让我们将所有安装复制到您的本地电脑:您可以将点替换为您电脑上的任何位置
docker cp container_id:/app/* .
回答by Pavel Anni
I've been struggling with a similar error while trying to use the python3.6 engine. When I switched to 2.7 it worked fine for me. I used Amazon AMI to create my zip file, but it has only python3.5, not 3.6. I guess the version mismatch was the reason. But it's just a guess, I haven't tried the process on a python3.6 installation yet.
在尝试使用 python3.6 引擎时,我一直在努力解决类似的错误。当我切换到 2.7 时,它对我来说效果很好。我使用 Amazon AMI 创建了我的 zip 文件,但它只有 python3.5,而不是 3.6。我猜版本不匹配是原因。但这只是一个猜测,我还没有在 python3.6 安装上尝试过这个过程。
回答by JD D
This is similar to Randeep's answer but you don't need to use Lambda Layers if you don't want to do that.
这类似于 Randeep 的答案,但如果您不想这样做,则不需要使用 Lambda 层。
As others have stated, this is not working because pandas/numpy require binaries to be built and the operating system of your build machine (Linux, Mac, Windows) does not match the operating system of Lambda (Amazon Linux).
正如其他人所说,这是行不通的,因为 pandas/numpy 需要构建二进制文件,并且构建机器的操作系统(Linux、Mac、Windows)与 Lambda(亚马逊 Linux)的操作系统不匹配。
To solve this, you can use docker to download/build your dependencies and package them up on Amazon Linux. Amazon provides a Docker image for this purpose. See below for how I built my python package for Python 3.6 runtime (they have other dockers for all other runtimes):
为了解决这个问题,您可以使用 docker 下载/构建您的依赖项并将它们打包到 Amazon Linux 上。亚马逊为此提供了一个 Docker 镜像。请参阅下文,了解我如何为 Python 3.6 运行时构建我的 python 包(它们有用于所有其他运行时的其他 docker):
Put all of your dependencies into a requirements.txt
file, for example:
将所有依赖项放入一个requirements.txt
文件中,例如:
openpyxl
boto3
pandas
Create a script (i.e. named build.sh
) that will build your package, here is what mine looked like:
创建一个脚本(即命名build.sh
)来构建你的包,这是我的样子:
#!/bin/bash
# remove old build artifacts
rm -rf build
rm lambda_package.zip
# make build dir and copy my lambda handler file into it
mkdir build
cp lambda_daily_util_gen.py build/
# Use requirements file to download/build dependencies into the build folder
cd build
pip install -r ../requirements.txt --target .
# Create an lambda package with my files and all dependencies
zip -r9 ../lambda_package.zip .
Ensure you have the Amazon Linux lambda build image pulled:
确保您已拉取 Amazon Linux lambda 构建映像:
$ docker pull lambci/lambda
Run your build script inside of the docker container:
在 docker 容器内运行构建脚本:
Windows:
视窗:
$ docker run --rm -v "$PWD":/var/task lambci/lambda:build-python3.6 /var/task/build.sh
Mac/Linux:
Mac/Linux:
docker run --rm -v ${PWD}:/var/task lambci/lambda:build-python3.6 chmod +x build.sh;./build.sh
You should now see a file named lambda_package.zip
that was built on Amazon Linux you can upload to AWS.
您现在应该会看到一个lambda_package.zip
在 Amazon Linux 上构建的名为的文件,您可以将其上传到 AWS。
Hope that helps.
希望有帮助。
回答by Vincent Claes
with the serverless framework, you can easily package and deploy your dependencies correctly.
使用无服务器框架,您可以轻松地正确打包和部署您的依赖项。
you only need to;
你只需要;
install serverless
npm install -g serverless
create a serverless.yml in the root of your project with the following:
service: numpy-test # define the environment of your lambda provider: name: aws runtime: python3.6 # specify the function you want to deploy functions: numpy: # path to your lambda_handler function handler: path/to/function.lambda_handler # add a plugin that allows serverless to package python libraries # specified in the requirements.txt or Pipfile plugins: - serverless-python-requirements # this section makes sure your libraries get build correctly # for an aws lambda environment custom: pythonRequirements: dockerizePip: non-linux
adjust the path/to/function.lambda_handler
make sure docker is running and execute
serverless deploy
安装无服务器
npm install -g serverless
使用以下内容在项目的根目录中创建 serverless.yml:
service: numpy-test # define the environment of your lambda provider: name: aws runtime: python3.6 # specify the function you want to deploy functions: numpy: # path to your lambda_handler function handler: path/to/function.lambda_handler # add a plugin that allows serverless to package python libraries # specified in the requirements.txt or Pipfile plugins: - serverless-python-requirements # this section makes sure your libraries get build correctly # for an aws lambda environment custom: pythonRequirements: dockerizePip: non-linux
调整 path/to/function.lambda_handler
确保 docker 正在运行并执行
serverless deploy
once the deployment is finished, go to the AWS console look for the function numpy-test-dev-numpyand test your function.
部署完成后,转到 AWS 控制台查找函数numpy-test-dev-numpy并测试您的函数。
this articleexplains the necessary steps in detail.
这篇文章详细解释了必要的步骤。