Python 如何使用 boto3 将 S3 对象保存到文件

Question

提问by Vor

I'm trying to do a "hello world" with new boto3client for AWS.

我正在尝试使用适用于 AWS 的新boto3客户端创建一个“hello world” 。

The use-case I have is fairly simple: get object from S3 and save it to the file.

我的用例非常简单：从 S3 获取对象并将其保存到文件中。

In boto 2.X I would do it like this:

在 boto 2.XI 中会这样做：

import boto
key = boto.connect_s3().get_bucket('foo').get_key('foo')
key.get_contents_to_filename('/tmp/foo')

In boto 3 . I can't find a clean way to do the same thing, so I'm manually iterating over the "Streaming" object:

在博托 3。我找不到做同样事情的干净方法，所以我手动迭代“Streaming”对象：

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    chunk = key['Body'].read(1024*8)
    while chunk:
        f.write(chunk)
        chunk = key['Body'].read(1024*8)

or

或者

import boto3
key = boto3.resource('s3').Object('fooo', 'docker/my-image.tar.gz').get()
with open('/tmp/my-image.tar.gz', 'w') as f:
    for chunk in iter(lambda: key['Body'].read(4096), b''):
        f.write(chunk)

And it works fine. I was wondering is there any "native" boto3 function that will do the same task?

它工作正常。我想知道是否有任何“本机”boto3 函数可以完成相同的任务？

Answer 1

采纳答案by Daniel

There is a customization that went into Boto3 recently which helps with this (among other things). It is currently exposed on the low-level S3 client, and can be used like this:

Boto3 最近有一个自定义功能，可以帮助解决这个问题（除其他外）。它目前暴露在低级 S3 客户端上，可以这样使用：

s3_client = boto3.client('s3')
open('hello.txt').write('Hello, world!')

# Upload the file to S3
s3_client.upload_file('hello.txt', 'MyBucket', 'hello-remote.txt')

# Download the file from S3
s3_client.download_file('MyBucket', 'hello-remote.txt', 'hello2.txt')
print(open('hello2.txt').read())

These functions will automatically handle reading/writing files as well as doing multipart uploads in parallel for large files.

这些函数将自动处理读/写文件以及对大文件并行执行分段上传。

Note that s3_client.download_filewon't create a directory. It can be created as pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True).

请注意，s3_client.download_file不会创建目录。它可以创建为pathlib.Path('/path/to/file.txt').parent.mkdir(parents=True, exist_ok=True).

Answer 2

回答by quodlibetor

boto3 now has a nicer interface than the client:

boto3 现在有一个比客户端更好的界面：

resource = boto3.resource('s3')
my_bucket = resource.Bucket('MyBucket')
my_bucket.download_file(key, local_filename)

This by itself isn't tremendously better than the clientin the accepted answer (although the docs say that it does a better job retrying uploads and downloads on failure) but considering that resources are generally more ergonomic (for example, the s3 bucketand objectresources are nicer than the client methods) this does allow you to stay at the resource layer without having to drop down.

这本身并不比client接受的答案好得多（尽管文档说它在失败时重试上传和下载做得更好）但考虑到资源通常更符合人体工程学（例如，s3存储桶和对象资源比客户端方法更好）这确实允许您留在资源层而不必下拉。

Resourcesgenerally can be created in the same way as clients, and they take all or most of the same arguments and just forward them to their internal clients.

Resources通常可以以与客户端相同的方式创建，并且它们采用所有或大部分相同的参数并将它们转发给它们的内部客户端。

Answer 3

回答by cgseller

For those of you who would like to simulate the set_contents_from_stringlike boto2 methods, you can try

对于那些想要模拟set_contents_from_string类似 boto2 方法的人，您可以尝试

import boto3
from cStringIO import StringIO

s3c = boto3.client('s3')
contents = 'My string to save to S3 object'
target_bucket = 'hello-world.by.vor'
target_file = 'data/hello.txt'
fake_handle = StringIO(contents)

# notice if you do fake_handle.read() it reads like a file handle
s3c.put_object(Bucket=target_bucket, Key=target_file, Body=fake_handle.read())

For Python3:

对于 Python3：

In python3 both StringIO and cStringIO are gone. Use the StringIOimport like:

在 python3 中StringIO 和 cStringIO 都消失了。使用StringIO导入，如：

from io import StringIO

To support both version:

支持两个版本：

try:
   from StringIO import StringIO
except ImportError:
   from io import StringIO

Answer 4

回答by Lord Sumner

# Preface: File is json with contents: {'name': 'Android', 'status': 'ERROR'}

import boto3
import io

s3 = boto3.resource('s3')

obj = s3.Object('my-bucket', 'key-to-file.json')
data = io.BytesIO()
obj.download_fileobj(data)

# object is now a bytes string, Converting it to a dict:
new_dict = json.loads(data.getvalue().decode("utf-8"))

print(new_dict['status']) 
# Should print "Error"

Answer 5

回答by Tushar Niras

Note: I'm assuming you have configured authentication separately. Below code is to download the single object from the S3 bucket.

注意：我假设您已经单独配置了身份验证。下面的代码是从 S3 存储桶下载单个对象。

import boto3

#initiate s3 client 
s3 = boto3.resource('s3')

#Download object to the file    
s3.Bucket('mybucket').download_file('hello.txt', '/tmp/hello.txt')

Answer 6

回答by Martin Thoma

When you want to read a file with a different configuration than the default one, feel free to use either mpu.aws.s3_download(s3path, destination)directly or the copy-pasted code:

当你想读取一个与默认配置不同的文件时，可以mpu.aws.s3_download(s3path, destination)直接使用或复制粘贴代码：

def s3_download(source, destination,
                exists_strategy='raise',
                profile_name=None):
    """
    Copy a file from an S3 source to a local destination.

    Parameters
    ----------
    source : str
        Path starting with s3://, e.g. 's3://bucket-name/key/foo.bar'
    destination : str
    exists_strategy : {'raise', 'replace', 'abort'}
        What is done when the destination already exists?
    profile_name : str, optional
        AWS profile

    Raises
    ------
    botocore.exceptions.NoCredentialsError
        Botocore is not able to find your credentials. Either specify
        profile_name or add the environment variables AWS_ACCESS_KEY_ID,
        AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN.
        See https://boto3.readthedocs.io/en/latest/guide/configuration.html
    """
    exists_strategies = ['raise', 'replace', 'abort']
    if exists_strategy not in exists_strategies:
        raise ValueError('exists_strategy \'{}\' is not in {}'
                         .format(exists_strategy, exists_strategies))
    session = boto3.Session(profile_name=profile_name)
    s3 = session.resource('s3')
    bucket_name, key = _s3_path_split(source)
    if os.path.isfile(destination):
        if exists_strategy is 'raise':
            raise RuntimeError('File \'{}\' already exists.'
                               .format(destination))
        elif exists_strategy is 'abort':
            return
    s3.Bucket(bucket_name).download_file(key, destination)

from collections import namedtuple

S3Path = namedtuple("S3Path", ["bucket_name", "key"])


def _s3_path_split(s3_path):
    """
    Split an S3 path into bucket and key.

    Parameters
    ----------
    s3_path : str

    Returns
    -------
    splitted : (str, str)
        (bucket, key)

    Examples
    --------
    >>> _s3_path_split('s3://my-bucket/foo/bar.jpg')
    S3Path(bucket_name='my-bucket', key='foo/bar.jpg')
    """
    if not s3_path.startswith("s3://"):
        raise ValueError(
            "s3_path is expected to start with 's3://', " "but was {}"
            .format(s3_path)
        )
    bucket_key = s3_path[len("s3://"):]
    bucket_name, key = bucket_key.split("/", 1)
    return S3Path(bucket_name, key)

Python 如何使用 boto3 将 S3 对象保存到文件

提问by Vor

采纳答案by Daniel

回答by quodlibetor

回答by cgseller

回答by Lord Sumner

回答by Tushar Niras

回答by Martin Thoma

相关推荐

最近更新

标签

Python 如何使用 boto3 将 S3 对象保存到文件

提问by Vor

采纳答案by Daniel

回答by quodlibetor

回答by cgseller

回答by Lord Sumner

回答by Tushar Niras

回答by Martin Thoma

相关推荐

Python IOError 和 OSError 之间的区别？

给定 wget 命令的 Python 等价物

Python 为什么我不能抑制 numpy 警告

Python NLTK：Bigrams trigrams Fourgrams

相关推荐

最近更新

标签