Python 如何使用 boto3 将 S3 对象保存到文件
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29378763/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to save S3 object to a file using boto3
提问by Vor
I'm trying to do a "hello world" with new boto3client for AWS.
我正在尝试使用适用于 AWS 的新boto3客户端创建一个“hello world” 。
The use-case I have is fairly simple: get object from S3 and save it to the file.
我的用例非常简单:从 S3 获取对象并将其保存到文件中。
In boto 2.X I would do it like this:
在 boto 2.XI 中会这样做:
import boto
key = boto.connect_s3().get_bucket('foo').get_key('foo')
key.get_contents_to_filename('/tmp/foo')
In boto 3 . I can't find a clean way to do the same thing, so I'm manually iterating over the "Streaming" object:
在博托 3。我找不到做同样事情的干净方法,所以我手动迭代“Streaming”对象:
import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
chunk = key['Body'].read(1024*8)
while chunk:
f.write(chunk)
chunk = key['Body'].read(1024*8)
or
或者
import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
for chunk in iter(lambda: key['Body'].read(4096), b''):
f.write(chunk)
And it works fine. I was wondering is there any "native" boto3 function that will do the same task?
它工作正常。我想知道是否有任何“本机”boto3 函数可以完成相同的任务?
采纳答案by Daniel
There is a customization that went into Boto3 recently which helps with this (among other things). It is currently exposed on the low-level S3 client, and can be used like this:
Boto3 最近有一个自定义功能,可以帮助解决这个问题(除其他外)。它目前暴露在低级 S3 客户端上,可以这样使用:
s3_client = boto3.client('s3')
open('hello.txt').write('Hello, world!')
# Upload the file to S3
s3_client.upload_file('hello.txt', 'MyBucket', 'hello-remote.txt')
# Download the file from S3
s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())
These functions will automatically handle reading/writing files as well as doing multipart uploads in parallel for large files.
这些函数将自动处理读/写文件以及对大文件并行执行分段上传。
Note that s3_client.download_file
won't create a directory. It can be created as pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True)
.
请注意,s3_client.download_file
不会创建目录。它可以创建为pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True)
.
回答by quodlibetor
boto3 now has a nicer interface than the client:
boto3 现在有一个比客户端更好的界面:
resource = boto3.resource('s3')
my_bucket = resource.Bucket('MyBucket')
my_bucket.download_file(key, local_filename)
This by itself isn't tremendously better than the client
in the accepted answer (although the docs say that it does a better job retrying uploads and downloads on failure) but considering that resources are generally more ergonomic (for example, the s3 bucketand objectresources are nicer than the client methods) this does allow you to stay at the resource layer without having to drop down.
这本身并不比client
接受的答案好得多(尽管文档说它在失败时重试上传和下载做得更好)但考虑到资源通常更符合人体工程学(例如,s3存储桶和对象资源比客户端方法更好)这确实允许您留在资源层而不必下拉。
Resources
generally can be created in the same way as clients, and they take all or most of the same arguments and just forward them to their internal clients.
Resources
通常可以以与客户端相同的方式创建,并且它们采用所有或大部分相同的参数并将它们转发给它们的内部客户端。
回答by cgseller
For those of you who would like to simulate the set_contents_from_string
like boto2 methods, you can try
对于那些想要模拟set_contents_from_string
类似 boto2 方法的人,您可以尝试
import boto3
from cStringIO import StringIO
s3c = boto3.client('s3')
contents = 'My string to save to S3 object'
target_bucket = 'hello-world.by.vor'
target_file = 'data/hello.txt'
fake_handle = StringIO(contents)
# notice if you do fake_handle.read() it reads like a file handle
s3c.put_object(Bucket=target_bucket, Key=target_file, Body=fake_handle.read())
For Python3:
对于 Python3:
In python3 both StringIO and cStringIO are gone. Use the StringIO
import like:
在 python3 中StringIO 和 cStringIO 都消失了。使用StringIO
导入,如:
from io import StringIO
To support both version:
支持两个版本:
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
回答by Lord Sumner
# Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}
import boto3
import io
s3 = boto3.resource('s3')
obj = s3.Object('my-bucket', 'key-to-file.json')
data = io.BytesIO()
obj.download_fileobj(data)
# object is now a bytes string, Converting it to a dict:
new_dict = json.loads(data.getvalue().decode("utf-8"))
print(new_dict['status'])
# Should print "Error"
回答by Tushar Niras
Note: I'm assuming you have configured authentication separately. Below code is to download the single object from the S3 bucket.
注意:我假设您已经单独配置了身份验证。下面的代码是从 S3 存储桶下载单个对象。
import boto3
#initiate s3 client
s3 = boto3.resource('s3')
#Download object to the file
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')
回答by Martin Thoma
When you want to read a file with a different configuration than the default one, feel free to use either mpu.aws.s3_download(s3path, destination)
directly or the copy-pasted code:
当你想读取一个与默认配置不同的文件时,可以mpu.aws.s3_download(s3path, destination)
直接使用或复制粘贴代码:
def s3_download(source, destination,
exists_strategy='raise',
profile_name=None):
"""
Copy a file from an S3 source to a local destination.
Parameters
----------
source : str
Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
destination : str
exists_strategy : {'raise', 'replace', 'abort'}
What is done when the destination already exists?
profile_name : str, optional
AWS profile
Raises
------
botocore.exceptions.NoCredentialsError
Botocore is not able to find your credentials. Either specify
profile_name or add the environment variables AWS_ACCESS_KEY_ID,
AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
See https://boto3.readthedocs.io/en/latest/guide/configuration.html
"""
exists_strategies = ['raise', 'replace', 'abort']
if exists_strategy not in exists_strategies:
raise ValueError('exists_strategy \'{}\' is not in {}'
.format(exists_strategy, exists_strategies))
session = boto3.Session(profile_name=profile_name)
s3 = session.resource('s3')
bucket_name, key = _s3_path_split(source)
if os.path.isfile(destination):
if exists_strategy is 'raise':
raise RuntimeError('File \'{}\' already exists.'
.format(destination))
elif exists_strategy is 'abort':
return
s3.Bucket(bucket_name).download_file(key, destination)
from collections import namedtuple
S3Path = namedtuple("S3Path", ["bucket_name", "key"])
def _s3_path_split(s3_path):
"""
Split an S3 path into bucket and key.
Parameters
----------
s3_path : str
Returns
-------
splitted : (str, str)
(bucket, key)
Examples
--------
>>> _s3_path_split('s3://my-bucket/foo/bar.jpg')
S3Path(bucket_name='my-bucket', key='foo/bar.jpg')
"""
if not s3_path.startswith("s3://"):
raise ValueError(
"s3_path is expected to start with 's3://', " "but was {}"
.format(s3_path)
)
bucket_key = s3_path[len("s3://"):]
bucket_name, key = bucket_key.split("/", 1)
return S3Path(bucket_name, key)