Python 使用 boto3 列出存储桶的内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30249069/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:08:01  来源:igfitidea点击:

Listing contents of a bucket with boto3

pythonamazon-s3botoboto3

提问by Amelio Vazquez-Reina

How can I see what's inside a bucket in S3 with boto3? (i.e. do an "ls")?

如何查看 S3 中存储桶内的内容boto3?(即做一个"ls")?

Doing the following:

执行以下操作:

import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('some/path/')

returns:

返回:

s3.Bucket(name='some/path/')

How do I see its contents?

我如何看到它的内容?

采纳答案by garnaat

One way to see the contents would be:

查看内容的一种方法是:

for my_bucket_object in my_bucket.objects.all():
    print(my_bucket_object)

回答by cgseller

This is similar to an 'ls' but it does not take into account the prefix folder convention and will list the objects in the bucket. It's left up to the reader to filter out prefixes which are part of the Key name.

这类似于“ls”,但它不考虑前缀文件夹约定,而是会列出存储桶中的对象。由读者来过滤掉作为密钥名称一部分的前缀。

In Python 2:

在 Python 2 中:

from boto.s3.connection import S3Connection

conn = S3Connection() # assumes boto.cfg setup
bucket = conn.get_bucket('bucket_name')
for obj in bucket.get_all_keys():
    print(obj.key)

In Python 3:

在 Python 3 中:

from boto3 import client

conn = client('s3')  # again assumes boto.cfg setup, assume AWS S3
for key in conn.list_objects(Bucket='bucket_name')['Contents']:
    print(key['Key'])

回答by Tushar Niras

I'm assuming you have configured authentication separately.

我假设您已经单独配置了身份验证。

import boto3
s3 = boto3.resource('s3')

my_bucket = s3.Bucket('bucket_name')

for file in my_bucket.objects.all():
    print(file.key)

回答by Erwin Alberto

If you want to pass the ACCESS and SECRET keys (which you should not do, because it is not secure):

如果您想传递 ACCESS 和 SECRET 密钥(您不应该这样做,因为它不安全):

from boto3.session import Session

ACCESS_KEY='your_access_key'
SECRET_KEY='your_secret_key'

session = Session(aws_access_key_id=ACCESS_KEY,
                  aws_secret_access_key=SECRET_KEY)
s3 = session.resource('s3')
your_bucket = s3.Bucket('your_bucket')

for s3_file in your_bucket.objects.all():
    print(s3_file.key)

回答by Daniel Vieira

A more parsimonious way, rather than iterating through via a for loop you could also just print the original object containing all files inside your S3 bucket:

一种更简洁的方式,而不是通过 for 循环进行迭代,您还可以只打印包含 S3 存储桶中所有文件的原始对象:

session = Session(aws_access_key_id=aws_access_key_id,aws_secret_access_key=aws_secret_access_key)
s3 = session.resource('s3')
bucket = s3.Bucket('bucket_name')

files_in_s3 = bucket.objects.all() 
#you can print this iterable with print(list(files_in_s3))

回答by Milean

I just did it like this, including the authentication method:

我就是这样做的,包括身份验证方法:

s3_client = boto3.client(
                's3',
                aws_access_key_id='access_key',
                aws_secret_access_key='access_key_secret',
                config=boto3.session.Config(signature_version='s3v4'),
                region_name='region'
            )

response = s3_client.list_objects(Bucket='bucket_name', Prefix=key)
if ('Contents' in response):
    # Object / key exists!
    return True
else:
    # Object / key DOES NOT exist!
    return False

回答by Gothburz

ObjectSummary:

对象摘要:

There are two identifiers that are attached to the ObjectSummary:

有两个标识符附加到 ObjectSummary:

  • bucket_name
  • key
  • 存储桶名称
  • 钥匙

boto3 S3: ObjectSummary

boto3 S3:对象摘要

More on Object Keys from AWS S3 Documentation:

AWS S3 文档中有关对象键的更多信息:

Object Keys:

When you create an object, you specify the key name, which uniquely identifies the object in the bucket. For example, in the Amazon S3 console (see AWS Management Console), when you highlight a bucket, a list of objects in your bucket appears. These names are the object keys. The name for a key is a sequence of Unicode characters whose UTF-8 encoding is at most 1024 bytes long.

The Amazon S3 data model is a flat structure: you create a bucket, and the bucket stores objects. There is no hierarchy of subbuckets or subfolders; however, you can infer logical hierarchy using key name prefixes and delimiters as the Amazon S3 console does. The Amazon S3 console supports a concept of folders. Suppose that your bucket (admin-created) has four objects with the following object keys:

Development/Projects1.xls

Finance/statement1.pdf

Private/taxdocument.pdf

s3-dg.pdf

Reference:

AWS S3: Object Keys

对象键:

创建对象时,您指定键名,该键名唯一标识存储桶中的对象。例如,在 Amazon S3 控制台(请参阅 AWS 管理控制台)中,当您突出显示一个存储桶时,您的存储桶中会出现一个对象列表。这些名称是对象键。密钥的名称是 Unicode 字符序列,其 UTF-8 编码长度最多为 1024 个字节。

Amazon S3 数据模型是一种扁平结构:您创建一个存储桶,该存储桶存储对象。没有子桶或子文件夹的层次结构;但是,您可以像 Amazon S3 控制台那样使用键名前缀和分隔符来推断逻辑层次结构。Amazon S3 控制台支持文件夹的概念。假设您的存储桶(由管理员创建)有四个对象,它们具有以下对象键:

开发/项目1.xls

财务/报表1.pdf

私人/税务文件.pdf

s3-dg.pdf

参考:

AWS S3:对象键

Here is some example code that demonstrates how to get the bucket name and the object key.

下面是一些示例代码,演示了如何获取存储桶名称和对象键。

Example:

例子:

import boto3
from pprint import pprint

def main():

    def enumerate_s3():
        s3 = boto3.resource('s3')
        for bucket in s3.buckets.all():
             print("Name: {}".format(bucket.name))
             print("Creation Date: {}".format(bucket.creation_date))
             for object in bucket.objects.all():
                 print("Object: {}".format(object))
                 print("Object bucket_name: {}".format(object.bucket_name))
                 print("Object key: {}".format(object.key))

    enumerate_s3()


if __name__ == '__main__':
    main()

回答by Hephaestus

In order to handle large key listings (i.e. when the directory list is greater than 1000 items), I used the following code to accumulate key values (i.e. filenames) with multiple listings (thanks to Amelio above for the first lines). Code is for python3:

为了处理大的关键列表(即当目录列表大于 1000 项时),我使用以下代码来累积多个列表的关键值(即文件名)(感谢上面的 Amelio 的第一行)。代码适用于python3:

    from boto3  import client
    bucket_name = "my_bucket"
    prefix      = "my_key/sub_key/lots_o_files"

    s3_conn   = client('s3')  # type: BaseClient  ## again assumes boto.cfg setup, assume AWS S3
    s3_result =  s3_conn.list_objects_v2(Bucket=bucket_name, Prefix=prefix, Delimiter = "/")

    if 'Contents' not in s3_result:
        #print(s3_result)
        return []

    file_list = []
    for key in s3_result['Contents']:
        file_list.append(key['Key'])
    print(f"List count = {len(file_list)}")

    while s3_result['IsTruncated']:
        continuation_key = s3_result['NextContinuationToken']
        s3_result = s3_conn.list_objects_v2(Bucket=bucket_name, Prefix=prefix, Delimiter="/", ContinuationToken=continuation_key)
        for key in s3_result['Contents']:
            file_list.append(key['Key'])
        print(f"List count = {len(file_list)}")
    return file_list

回答by Sean Summers

My s3 keysutility functionis essentially an optimized version of @Hephaestus's answer:

我的s3keys实用程序函数本质上是@Hephaestus 答案的优化版本:

import boto3


s3_paginator = boto3.client('s3').get_paginator('list_objects_v2')


def keys(bucket_name, prefix='/', delimiter='/', start_after=''):
    prefix = prefix[1:] if prefix.startswith(delimiter) else prefix
    start_after = (start_after or prefix) if prefix.endswith(delimiter) else start_after
    for page in s3_paginator.paginate(Bucket=bucket_name, Prefix=prefix, StartAfter=start_after):
        for content in page.get('Contents', ()):
            yield content['Key']

In my tests (boto3 1.9.84), it's significantly faster than the equivalent (but simpler) code:

在我的测试 (boto3 1.9.84) 中,它比等效(但更简单)的代码要快得多:

import boto3


def keys(bucket_name, prefix='/', delimiter='/'):
    prefix = prefix[1:] if prefix.startswith(delimiter) else prefix
    bucket = boto3.resource('s3').Bucket(bucket_name)
    return (_.key for _ in bucket.objects.filter(Prefix=prefix))

As S3 guarantees UTF-8 binary sorted results, a start_afteroptimization has been added to the first function.

由于S3 保证 UTF-8 二进制排序结果start_after因此对第一个函数添加了优化。

回答by Imran Selim

#To print all filenames in a bucket
import boto3

s3 = boto3.client('s3')

def get_s3_keys(bucket):

    """Get a list of keys in an S3 bucket."""
    resp = s3.list_objects_v2(Bucket=bucket)
    for obj in resp['Contents']:
      files = obj['Key']
    return files


filename = get_s3_keys('your_bucket_name')

print(filename)

#To print all filenames in a certain directory in a bucket
import boto3

s3 = boto3.client('s3')

def get_s3_keys(bucket, prefix):

    """Get a list of keys in an S3 bucket."""
    resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix)
    for obj in resp['Contents']:
      files = obj['Key']
      print(files)
    return files


filename = get_s3_keys('your_bucket_name', 'folder_name/sub_folder_name/')

print(filename)