Python 使用 Lambda 从 S3 读取数据
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33782984/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Reading data from S3 using Lambda
提问by LearningSlowly
I have a range of json files stored in an S3 bucket on AWS.
我在 AWS 上的 S3 存储桶中存储了一系列 json 文件。
I wish to use AWS lambda python service to parse this json and send the parsed results to an AWS RDS MySQL database.
我希望使用 AWS lambda python 服务来解析这个 json 并将解析结果发送到 AWS RDS MySQL 数据库。
I have a stable python script for doing the parsing and writing to the database. I need to lambda script to iterate through the json files (when they are added).
我有一个稳定的 python 脚本,用于解析和写入数据库。我需要 lambda 脚本来遍历 json 文件(添加它们时)。
Each json file contains a list, simple consisting of results = [content]
每个json文件包含一个列表,简单的由 results = [content]
In pseudo-code what I want is:
在伪代码中,我想要的是:
- Connect to the S3 bucket (
jsondata
) - Read the contents of the JSON file (
results
) - Execute my script for this data (
results
)
- 连接到 S3 存储桶 (
jsondata
) - 读取 JSON 文件的内容 (
results
) - 为该数据执行我的脚本 (
results
)
I can list the buckets I have by:
我可以通过以下方式列出我拥有的桶:
import boto3
s3 = boto3.resource('s3')
for bucket in s3.buckets.all():
print(bucket.name)
Giving:
给予:
jsondata
But I cannot access this bucket to read its results.
但我无法访问此存储桶以读取其结果。
There doesn't appear to be a read
or load
function.
似乎没有 aread
或load
函数。
I wish for something like
我希望像
for bucket in s3.buckets.all():
print(bucket.contents)
EDIT
编辑
I am misunderstanding something. Rather than reading the file in S3, lambda must download it itself.
我误解了一些东西。lambda 必须自己下载文件,而不是在 S3 中读取文件。
From hereit seems that you must give lambda a download path, from which it can access the files itself
从这里看来,您必须给 lambda 一个下载路径,它可以从中访问文件本身
import libraries
s3_client = boto3.client('s3')
def function to be executed:
blah blah
def handler(event, context):
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
download_path = '/tmp/{}{}'.format(uuid.uuid4(), key)
s3_client.download_file(bucket, key, download_path)
采纳答案by Dysosmus
You can use bucket.objects.all()
to get a list of the all objects in the bucket (you also have alternative methods like filter
, page_size
and limit
depending on your need)
您可以使用bucket.objects.all()
来获得在桶中的所有对象的列表(也有替代方法,比如filter
,page_size
以及limit
根据您的需要)
These methods return an iterator with S3.ObjectSummary
objects in it, from there you can use the method object.get
to retrieve the file.
这些方法返回一个包含S3.ObjectSummary
对象的迭代器,从那里你可以使用该方法object.get
来检索文件。
回答by James Hogbin
s3 = boto3.client('s3')
response = s3.get_object(Bucket=bucket, Key=key)
emailcontent = response['Body'].read().decode('utf-8')