pandas 熊猫和 AWS Lambda
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36054976/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas & AWS Lambda
提问by Moe
Does anyone have a fully compiled version of pandas that is compatible with AWS Lambda?
是否有人拥有与 AWS Lambda 兼容的完整编译版本的 Pandas?
After searching around for a few hours, I cannot seem to find what I'm looking for and the documentation on this subject is non-existent.
搜索了几个小时后,我似乎找不到我要找的东西,而且关于这个主题的文档也不存在。
I need access to the package in a lambda function however I have been unsuccessful at getting the package to compile properly for usage in a Lambda function.
我需要访问 lambda 函数中的包,但是我未能成功地使包正确编译以在 Lambda 函数中使用。
In lieu of the compilation can anyone provide reproducible steps to create the binaries?
任何人都可以提供可重现的步骤来创建二进制文件来代替编译吗?
Unfortunately I have not been able to successfully reproduce any of the guides on the subjects as they mostly combine pandas with scipy which I don't need and adds an extra layer of burden.
不幸的是,我无法成功复制有关这些主题的任何指南,因为它们主要将 Pandas 与我不需要的 scipy 结合在一起,并增加了额外的负担。
采纳答案by Moe
After some tinkering around and lot's of googling I was able to make everything work and setup a repo that can just be cloned in the future.
经过一些修补和大量的谷歌搜索后,我能够使一切正常工作并设置一个可以在将来克隆的存储库。
Key takeaways:
关键要点:
- All static packages have to be compiled on an ec2 amazon Linux instance
- The python code needs to load the libraries in the lib/ folder before executing.
- 所有静态包都必须在 ec2 amazon Linux 实例上编译
- python 代码需要在执行之前加载 lib/ 文件夹中的库。
Github repo: https://github.com/moesy/AWS-Lambda-ML-Microservice-Skeleton
Github 存储库:https: //github.com/moesy/AWS-Lambda-ML-Microservice-Skeleton
回答by blueskin
I believe you should be able to use the recent pandas version (or likely, the one on your machine). You can create a lambda package with pandas by yourself like this,
我相信您应该能够使用最近的 pandas 版本(或者可能是您机器上的那个版本)。你可以像这样自己用pandas创建一个lambda包,
First find where the pandas package is installed on your machine i.e. Open a python terminal and type
import pandas pandas.__file__
That should print something like
'/usr/local/lib/python3.4/site-packages/pandas/__init__.py'
- Now copy the pandas folder from that location (in this case
'/usr/local/lib/python3.4/site-packages/pandas
) and place it in your repository. Package your Lambda code with pandas like this:
zip -r9 my_lambda.zip pandas/ zip -9 my_lambda.zip my_lambda_function.py
首先找到pandas包在你机器上的安装位置,即打开一个python终端并输入
import pandas pandas.__file__
那应该打印类似的东西
'/usr/local/lib/python3.4/site-packages/pandas/__init__.py'
- 现在从该位置复制 pandas 文件夹(在本例中为
'/usr/local/lib/python3.4/site-packages/pandas
)并将其放置在您的存储库中。 使用 Pandas 打包您的 Lambda 代码,如下所示:
zip -r9 my_lambda.zip pandas/ zip -9 my_lambda.zip my_lambda_function.py
You can also deploy your code to S3 and make your Lambda use the code from S3.
您还可以将代码部署到 S3,并使您的 Lambda 使用来自 S3 的代码。
aws s3 cp my_lambda.zip s3://dev-code//projectx/lambda_packages/
回答by b3rt0
I know the question was asked a couple years ago and Lambda was on a different stage back then.
我知道这个问题是几年前提出的,当时 Lambda 处于不同的阶段。
I faced similar issues lately and I thought it would be a good idea to add the newest solution here for future users facing the same problem.
我最近遇到了类似的问题,我认为在这里为未来面临相同问题的用户添加最新的解决方案是个好主意。
It turns out that amazon released the concept of layers in the re:Invent 2018. It is a great feature. This post in medium describes it much better than I could here: Creating New AWS Lambda Layer For Python Pandas Library
原来亚马逊在 re:Invent 2018 中发布了层的概念。这是一个很棒的功能。这篇中等文章对它的描述比我在这里要好得多:为 Python Pandas 库创建新的 AWS Lambda 层
回答by shadi
The repo mthenw/awesome-layerslists several publicly available aws lambda layers.
repo mthenw/awesome-layers列出了几个公开可用的 aws lambda 层。
In particular, keithrozario/Klayershas pandas+numpy and is up-to-date as of today with pandas 0.25.
特别是,keithrozario/Klayers有 pandas+numpy 并且是最新的,截至今天,pandas 为 0.25。
Its ARN is arn:aws:lambda:us-east-1:113088814899:layer:Klayers-python37-pandas:1
它的 ARN 是 arn:aws:lambda:us-east-1:113088814899:layer:Klayers-python37-pandas:1
回答by user3661992
@ashtonium's answer actually works and is most likely the easiest, however, a few additional steps are required. Also, Pandas requires Pytz (mentioned in the link provided by @b3rt0) so that package is needed as well.
@ashtonium 的答案实际上有效并且很可能是最简单的,但是,还需要一些额外的步骤。此外,Pandas 需要 Pytz(在@b3rt0 提供的链接中提到),因此也需要该包。
- Download the whl-files from PyPI (the Pandasfile ends with ...manylinux1_x86_64.whl, there is only one Pytzfile of relevance)
- Unzip the whl-files using terminal command, e.g.
unzip filename.whl
(Linux/MacOS) - Create a new folder structure
python/lib/python3.7/site-packages/
(swap 3.7 for version of your choice) - Move folders from step 2 to site-packages folder in step 3
- Zip root folder in new structure, i.e.
python
- Create a new layer in AWS management console where you upload the zip-file
- 从 PyPI 下载 whl 文件(Pandas文件以 ...manylinux1_x86_64.whl 结尾,只有一个相关的Pytz文件)
- 使用终端命令解压 whl 文件,例如
unzip filename.whl
(Linux/MacOS) - 创建一个新的文件夹结构
python/lib/python3.7/site-packages/
(将 3.7 换成您选择的版本) - 将第 2 步中的文件夹移动到第 3 步中的 site-packages 文件夹
- 以新结构压缩根文件夹,即
python
- 在您上传 zip 文件的 AWS 管理控制台中创建一个新层
This is a very common question, I hope my solution helps.
这是一个非常常见的问题,我希望我的解决方案有所帮助。
回答by ashtonium
Another option is to download the pre-compiled wheel files as discussed on this post: https://aws.amazon.com/premiumsupport/knowledge-center/lambda-python-package-compatible/
另一种选择是下载此帖子中讨论的预编译轮文件:https: //aws.amazon.com/premiumsupport/knowledge-center/lambda-python-package-compatible/
Essentially, you need to go to the project page on https://pypi.organd download the files named like the following:
本质上,您需要转到https://pypi.org上的项目页面并下载如下命名的文件:
- For Python 2.7: module-name-version-cp27-cp27mu-manylinux1_x86_64.whl
- For Python 3.6: module-name-version-cp36-cp36m-manylinux1_x86_64.whl
- 对于 Python 2.7:module-name-version-cp27-cp27mu-manylinux1_x86_64.whl
- 对于 Python 3.6:module-name-version-cp36-cp36m-manylinux1_x86_64.whl
Then unzip the .whl files to your project directory and re-zip the contents together with your lambda code.
然后将 .whl 文件解压缩到您的项目目录,并将内容与您的 lambda 代码一起重新压缩。
NOTE: The main Python function file(s) must be in the root folder of the resulting deployment package .zip file. Other Python modules and dependencies can be in sub-folders. Something like:
注意:主要 Python 函数文件必须位于生成的部署包 .zip 文件的根文件夹中。其他 Python 模块和依赖项可以位于子文件夹中。就像是:
my_lambda_deployment_package.zip
├───lambda_function.py
├───numpy
│ ├───[subfolders...]
├───pandas
│ ├───[subfolders...]
└───[additional package folders...]
回答by Bemullen
The easiest way to get pandas working in a Lambda function is to utilize Lambda Layers and AWS Data Wrangler. A Lambda Layer is a zip archive that contains libraries or dependencies. According to the AWS documentation, using layers keeps your deployment package small, making development easier.
让 Pandas 在 Lambda 函数中工作的最简单方法是利用 Lambda 层和 AWS Data Wrangler。Lambda 层是包含库或依赖项的 zip 存档。根据 AWS 文档,使用层可以使您的部署包变小,从而使开发更容易。
The AWS Data Wrangler is an open source package that extends the power of pandas to AWS services.
AWS Data Wrangler 是一个开源包,可将 Pandas 的功能扩展到 AWS 服务。
Follow the instructions (under AWS Lambda Layer) here.
回答by ymaghzaz
I managed to deploy a pandas code in aws lambda using python3.6 runtime . this is the step that i follow :
我设法使用 python3.6 运行时在 aws lambda 中部署了一个 Pandas 代码。这是我遵循的步骤:
- Add required libraries into requirements.txt
- Build project in a docker container (using aws sam cli : sam build --use-container)
- Run code (sam local invoke --event test.json)
- 将所需的库添加到 requirements.txt
- 在 docker 容器中构建项目(使用 aws sam cli : sam build --use-container)
- 运行代码(sam local invoke --event test.json)
this is a helper : https://github.com/ysfmag/aws-lambda-py-pandas-template
这是一个帮手:https: //github.com/ysfmag/aws-lambda-py-pandas-template