Python 从 boto3 检索 S3 存储桶中的子文件夹名称

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35803027/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:58:35  来源:igfitidea点击:

Retrieving subfolders names in S3 bucket from boto3

pythonamazon-web-servicesamazon-s3boto3

提问by mar tin

Using boto3, I can access my AWS S3 bucket:

使用 boto3,我可以访问我的 AWS S3 存储桶:

s3 = boto3.resource('s3')
bucket = s3.Bucket('my-bucket-name')

Now, the bucket contains folder first-level, which itself contains several sub-folders named with a timestamp, for instance 1456753904534. I need to know the name of these sub-folders for another job I'm doing and I wonder whether I could have boto3 retrieve those for me.

现在,存储桶包含文件夹first-level,文件夹本身包含多个以时间戳命名的子文件夹,例如1456753904534。我需要知道我正在做的另一项工作的这些子文件夹的名称,我想知道是否可以让 boto3 为我检索这些子文件夹。

So I tried:

所以我试过:

objs = bucket.meta.client.list_objects(Bucket='my-bucket-name')

which gives a dictionary, whose key 'Contents' gives me all the third-level files instead of the second-level timestamp directories, in fact I get a list containing things as

它给出了一个字典,其键“内容”为我提供了所有三级文件而不是二级时间戳目录,实际上我得到了一个包含以下内容的列表

{u'ETag': '"etag"', u'Key': first-level/1456753904534/part-00014', u'LastModified': datetime.datetime(2016, 2, 29, 13, 52, 24, tzinfo=tzutc()),
u'Owner': {u'DisplayName': 'owner', u'ID': 'id'},
u'Size': size, u'StorageClass': 'storageclass'}

{u'ETag': '"etag"', u'Key': first-level/1456753904534/part-00014', u'LastModified': datetime.datetime(2016, 2, 29, 13, 52, 24, tzinfo =tzutc()),
u'Owner': {u'DisplayName': 'owner', u'ID': 'id'},
u'Size': 大小, u'StorageClass': 'storageclass'}

you can see that the specific files, in this case part-00014are retrieved, while I'd like to get the name of the directory alone. In principle I could strip out the directory name from all the paths but it's ugly and expensive to retrieve everything at third level to get the second level!

您可以看到在这种情况下part-00014检索了特定文件,而我想单独获取目录的名称。原则上我可以从所有路径中删除目录名称,但是在第三级检索所有内容以获得第二级是丑陋且昂贵的!

I also tried something reported here:

我也试过这里报道的东西

for o in bucket.objects.filter(Delimiter='/'):
    print(o.key)

but I do not get the folders at the desired level.

但我没有获得所需级别的文件夹。

Is there a way to solve this?

有没有办法解决这个问题?

采纳答案by mootmoot

S3 is an object storage, it doesn't have real directory structure. The "/" is rather cosmetic. One reason that people want to have a directory structure, because they can maintain/prune/add a tree to the application. For S3, you treat such structure as sort of index or search tag.

S3 是一个对象存储,它没有真正的目录结构。“/”是相当装饰性的。人们想要一个目录结构的一个原因,因为他们可以维护/修剪/向应用程序添加一棵树。对于 S3,您将此类结构视为一种索引或搜索标签。

To manipulate object in S3, you need boto3.client or boto3.resource, e.g. To list all object

要在 S3 中操作对象,您需要 boto3.client 或 boto3.resource,例如要列出所有对象

import boto3 
s3 = boto3.client("s3")
all_objects = s3.list_objects(Bucket = 'bucket-name') 

http://boto3.readthedocs.org/en/latest/reference/services/s3.html#S3.Client.list_objects

http://boto3.readthedocs.org/en/latest/reference/services/s3.html#S3.Client.list_objects

In fact, if the s3 object name is stored using '/' separator. The more recent version of list_objects (list_objects_v2) allows you to limit the response to keys that begin with the specified prefix.

实际上,如果 s3 对象名称是使用“/”分隔符存储的。最新版本的 list_objects (list_objects_v2) 允许您将响应限制为以指定前缀开头的键。

To limit the items to items under certain sub-folders:

要将项目限制为某些子文件夹下的项目:

    import boto3 
    s3 = boto3.client("s3")
    response = s3.list_objects_v2(
            Bucket=BUCKET,
            Prefix ='DIR1/DIR2',
            MaxKeys=100 )

Documentation

文档

Another option is using python os.path function to extract the folder prefix. Problem is that this will require listing objects from undesired directories.

另一种选择是使用 python os.path 函数来提取文件夹前缀。问题是这将需要从不需要的目录中列出对象。

import os
s3_key = 'first-level/1456753904534/part-00014'
filename = os.path.basename(s3_key) 
foldername = os.path.dirname(s3_key)

# if you are not using conventional delimiter like '#' 
s3_key = 'first-level#1456753904534#part-00014
filename = s3_key.split("#")[-1]

A reminder about boto3 : boto3.resource is a nice high level API. There are pros and cons using boto3.client vs boto3.resource. If you develop internal shared library, using boto3.resource will give you a blackbox layer over the resources used.

关于 boto3 的提醒:boto3.resource 是一个不错的高级 API。使用 boto3.client 和 boto3.resource 各有利弊。如果您开发内部共享库,使用 boto3.resource 将为您提供所用资源的黑盒层。

回答by Dipankar

Below piece of code returns ONLY the 'subfolders' in a 'folder' from s3 bucket.

下面的一段代码仅返回 s3 存储桶中“文件夹”中的“子文件夹”。

import boto3
bucket = 'my-bucket'
#Make sure you provide / in the end
prefix = 'prefix-name-with-slash/'  

client = boto3.client('s3')
result = client.list_objects(Bucket=bucket, Prefix=prefix, Delimiter='/')
for o in result.get('CommonPrefixes'):
    print 'sub folder : ', o.get('Prefix')

For more details, you can refer to https://github.com/boto/boto3/issues/134

更多详情可以参考https://github.com/boto/boto3/issues/134

回答by azhar22k

It took me a lot of time to figure out, but finally here is a simple way to list contents of a subfolder in S3 bucket using boto3. Hope it helps

我花了很多时间才弄明白,但最后这里有一种使用 boto3 列出 S3 存储桶中子文件夹内容的简单方法。希望能帮助到你

prefix = "folderone/foldertwo/"
s3 = boto3.resource('s3')
bucket = s3.Bucket(name="bucket_name_here")
FilesNotFound = True
for obj in bucket.objects.filter(Prefix=prefix):
     print('{0}:{1}'.format(bucket.name, obj.key))
     FilesNotFound = False
if FilesNotFound:
     print("ALERT", "No file in {0}/{1}".format(bucket, prefix))

回答by Pierre D

Short answer:

简短的回答

  • Use Delimiter='/'. This avoids doing a recursive listing of your bucket. Some answers here wrongly suggest doing a full listing and using some string manipulation to retrieve the directory names. This could be horribly inefficient. Remember that S3 has virtually no limit on the number of objects a bucket can contain. So, imagine that, between bar/and foo/, you have a trillion objects: you would wait a very long time to get ['bar/', 'foo/'].

  • Use Paginators. For the same reason (S3 is an engineer's approximation of infinity), you mustlist through pages and avoid storing all the listing in memory. Instead, consider your "lister" as an iterator, and handle the stream it produces.

  • Use boto3.client, not boto3.resource. The resourceversion doesn't seem to handle well the Delimiteroption. If you have a resource, say a bucket = boto3.resource('s3').Bucket(name), you can get the corresponding client with: bucket.meta.client.

  • 使用Delimiter='/'. 这避免了对您的存储桶进行递归列表。这里的一些答案错误地建议做一个完整的列表并使用一些字符串操作来检索目录名称。这可能是非常低效的。请记住,S3 实际上对存储桶可以包含的对象数量没有限制。因此,想象一下,在bar/和之间foo/,您有 1 万亿个对象:您将等待很长时间才能获得['bar/', 'foo/']

  • 使用Paginators. 出于同样的原因(S3 是工程师对无穷大的近似),您必须逐页列出并避免将所有列表存储在内存中。相反,将您的“lister”视为迭代器,并处理它产生的流。

  • 使用boto3.client,不是boto3.resource。该resource版本似乎不能很好地处理该Delimiter选项。如果你有一个资源,比如说 a bucket = boto3.resource('s3').Bucket(name),你可以通过: 获得相应的客户端bucket.meta.client

Long answer:

长答案

The following is an iterator that I use for simple buckets (no version handling).

以下是我用于简单存储桶(无版本处理)的迭代器。

import boto3
from collections import namedtuple
from operator import attrgetter


S3Obj = namedtuple('S3Obj', ['key', 'mtime', 'size', 'ETag'])


def s3list(bucket, path, start=None, end=None, recursive=True, list_dirs=True,
           list_objs=True, limit=None):
    """
    Iterator that lists a bucket's objects under path, (optionally) starting with
    start and ending before end.

    If recursive is False, then list only the "depth=0" items (dirs and objects).

    If recursive is True, then list recursively all objects (no dirs).

    Args:
        bucket:
            a boto3.resource('s3').Bucket().
        path:
            a directory in the bucket.
        start:
            optional: start key, inclusive (may be a relative path under path, or
            absolute in the bucket)
        end:
            optional: stop key, exclusive (may be a relative path under path, or
            absolute in the bucket)
        recursive:
            optional, default True. If True, lists only objects. If False, lists
            only depth 0 "directories" and objects.
        list_dirs:
            optional, default True. Has no effect in recursive listing. On
            non-recursive listing, if False, then directories are omitted.
        list_objs:
            optional, default True. If False, then directories are omitted.
        limit:
            optional. If specified, then lists at most this many items.

    Returns:
        an iterator of S3Obj.

    Examples:
        # set up
        >>> s3 = boto3.resource('s3')
        ... bucket = s3.Bucket(name)

        # iterate through all S3 objects under some dir
        >>> for p in s3ls(bucket, 'some/dir'):
        ...     print(p)

        # iterate through up to 20 S3 objects under some dir, starting with foo_0010
        >>> for p in s3ls(bucket, 'some/dir', limit=20, start='foo_0010'):
        ...     print(p)

        # non-recursive listing under some dir:
        >>> for p in s3ls(bucket, 'some/dir', recursive=False):
        ...     print(p)

        # non-recursive listing under some dir, listing only dirs:
        >>> for p in s3ls(bucket, 'some/dir', recursive=False, list_objs=False):
        ...     print(p)
"""
    kwargs = dict()
    if start is not None:
        if not start.startswith(path):
            start = os.path.join(path, start)
        # note: need to use a string just smaller than start, because
        # the list_object API specifies that start is excluded (the first
        # result is *after* start).
        kwargs.update(Marker=__prev_str(start))
    if end is not None:
        if not end.startswith(path):
            end = os.path.join(path, end)
    if not recursive:
        kwargs.update(Delimiter='/')
        if not path.endswith('/'):
            path += '/'
    kwargs.update(Prefix=path)
    if limit is not None:
        kwargs.update(PaginationConfig={'MaxItems': limit})

    paginator = bucket.meta.client.get_paginator('list_objects')
    for resp in paginator.paginate(Bucket=bucket.name, **kwargs):
        q = []
        if 'CommonPrefixes' in resp and list_dirs:
            q = [S3Obj(f['Prefix'], None, None, None) for f in resp['CommonPrefixes']]
        if 'Contents' in resp and list_objs:
            q += [S3Obj(f['Key'], f['LastModified'], f['Size'], f['ETag']) for f in resp['Contents']]
        # note: even with sorted lists, it is faster to sort(a+b)
        # than heapq.merge(a, b) at least up to 10K elements in each list
        q = sorted(q, key=attrgetter('key'))
        if limit is not None:
            q = q[:limit]
            limit -= len(q)
        for p in q:
            if end is not None and p.key >= end:
                return
            yield p


def __prev_str(s):
    if len(s) == 0:
        return s
    s, c = s[:-1], ord(s[-1])
    if c > 0:
        s += chr(c - 1)
    s += ''.join(['\u7FFF' for _ in range(10)])
    return s

Test:

测试

The following is helpful to test the behavior of the paginatorand list_objects. It creates a number of dirs and files. Since the pages are up to 1000 entries, we use a multiple of that for dirs and files. dirscontains only directories (each having one object). mixedcontains a mix of dirs and objects, with a ratio of 2 objects for each dir (plus one object under dir, of course; S3 stores only objects).

以下是有助于测试的行为paginatorlist_objects。它创建了许多目录和文件。由于页面多达 1000 个条目,因此我们对目录和文件使用多个条目。dirs只包含目录(每个目录有一个对象)。mixed包含目录和对象的混合,每个目录有 2 个对象(当然,在 dir 下加上一个对象;S3 只存储对象)。

import concurrent
def genkeys(top='tmp/test', n=2000):
    for k in range(n):
        if k % 100 == 0:
            print(k)
        for name in [
            os.path.join(top, 'dirs', f'{k:04d}_dir', 'foo'),
            os.path.join(top, 'mixed', f'{k:04d}_dir', 'foo'),
            os.path.join(top, 'mixed', f'{k:04d}_foo_a'),
            os.path.join(top, 'mixed', f'{k:04d}_foo_b'),
        ]:
            yield name


with concurrent.futures.ThreadPoolExecutor(max_workers=32) as executor:
    executor.map(lambda name: bucket.put_object(Key=name, Body='hi\n'.encode()), genkeys())

The resulting structure is:

得到的结构是:

./dirs/0000_dir/foo
./dirs/0001_dir/foo
./dirs/0002_dir/foo
...
./dirs/1999_dir/foo
./mixed/0000_dir/foo
./mixed/0000_foo_a
./mixed/0000_foo_b
./mixed/0001_dir/foo
./mixed/0001_foo_a
./mixed/0001_foo_b
./mixed/0002_dir/foo
./mixed/0002_foo_a
./mixed/0002_foo_b
...
./mixed/1999_dir/foo
./mixed/1999_foo_a
./mixed/1999_foo_b

With a little bit of doctoring of the code given above for s3listto inspect the responses from the paginator, you can observe some fun facts:

通过对上面给出的代码进行一些修改s3list以检查来自 的响应paginator,您可以观察到一些有趣的事实:

  • The Markeris really exclusive. Given Marker=topdir + 'mixed/0500_foo_a'will make the listing start afterthat key (as per the AmazonS3 API), i.e., with .../mixed/0500_foo_b. That's the reason for __prev_str().

  • Using Delimiter, when listing mixed/, each response from the paginatorcontains 666 keys and 334 common prefixes. It's pretty good at not building enormous responses.

  • By contrast, when listing dirs/, each response from the paginatorcontains 1000 common prefixes (and no keys).

  • Passing a limit in the form of PaginationConfig={'MaxItems': limit}limits only the number of keys, not the common prefixes. We deal with that by further truncating the stream of our iterator.

  • Marker是真的排斥。GivenMarker=topdir + 'mixed/0500_foo_a'将使列表该键之后开始(根据AmazonS3 API),即使用.../mixed/0500_foo_b. 这就是原因__prev_str()

  • 使用Delimiter,在列出 时mixed/,来自 的每个响应paginator包含 666 个键和 334 个公共前缀。它非常擅长不构建大量响应。

  • 相比之下,当列出 时dirs/,来自 的每个响应都paginator包含 1000 个公共前缀(并且没有键)。

  • PaginationConfig={'MaxItems': limit}仅限制键数的形式传递限制,而不是公共前缀。我们通过进一步截断迭代器的流来解决这个问题。

回答by CpILL

The big realisation with S3 is that there are no folders/directories just keys. The apparent folder structure is just prepended to the filenameto become the 'Key', so to list the contents of myBucket's some/path/to/the/file/you can try:

S3 的一个重要实现是没有文件夹/目录,只有键。该表观文件夹结构只是前置到文件名,成为“关键”,所以列表的内容myBucketsome/path/to/the/file/,你可以试试:

s3 = boto3.client('s3')
for obj in s3.list_objects_v2(Bucket="myBucket", Prefix="some/path/to/the/file/")['Contents']:
    print(obj['Key'])

which would give you something like:

这会给你类似的东西:

some/path/to/the/file/yo.jpg
some/path/to/the/file/meAndYou.gif
...

回答by Sophie Muspratt

I had the same issue but managed to resolve it using boto3.clientand list_objects_v2with Bucketand StartAfterparameters.

我有同样的问题,但管理用它来解决boto3.client,并list_objects_v2BucketStartAfter参数。

s3client = boto3.client('s3')
bucket = 'my-bucket-name'
startAfter = 'firstlevelFolder/secondLevelFolder'

theobjects = s3client.list_objects_v2(Bucket=bucket, StartAfter=startAfter )
for object in theobjects['Contents']:
    print object['Key']

The output result for the code above would display the following:

上述代码的输出结果将显示如下:

firstlevelFolder/secondLevelFolder/item1
firstlevelFolder/secondLevelFolder/item2

Boto3 list_objects_v2 Documentation

Boto3 list_objects_v2 文档

In order to strip out only the directory name for secondLevelFolderI just used python method split():

为了仅删除secondLevelFolder我刚刚使用的python方法的目录名称split()

s3client = boto3.client('s3')
bucket = 'my-bucket-name'
startAfter = 'firstlevelFolder/secondLevelFolder'

theobjects = s3client.list_objects_v2(Bucket=bucket, StartAfter=startAfter )
for object in theobjects['Contents']:
    direcoryName = object['Key'].encode("string_escape").split('/')
    print direcoryName[1]

The output result for the code above would display the following:

上述代码的输出结果将显示如下:

secondLevelFolder
secondLevelFolder

Python split() Documentation

Python split() 文档

If you'd like to get the directory name AND contents item name then replace the print line with the following:

如果您想获取目录名称和内容项名称,请将打印行替换为以下内容:

print "{}/{}".format(fileName[1], fileName[2])

And the following will be output:

将输出以下内容:

secondLevelFolder/item2
secondLevelFolder/item2

Hope this helps

希望这可以帮助

回答by cem

The following works for me... S3 objects:

以下对我有用... S3 对象:

s3://bucket/
    form1/
       section11/
          file111
          file112
       section12/
          file121
    form2/
       section21/
          file211
          file112
       section22/
          file221
          file222
          ...
      ...
   ...

Using:

使用:

from boto3.session import Session
s3client = session.client('s3')
resp = s3client.list_objects(Bucket=bucket, Prefix='', Delimiter="/")
forms = [x['Prefix'] for x in resp['CommonPrefixes']] 

we get:

我们得到:

form1/
form2/
...

With:

和:

resp = s3client.list_objects(Bucket=bucket, Prefix='form1/', Delimiter="/")
sections = [x['Prefix'] for x in resp['CommonPrefixes']] 

we get:

我们得到:

form1/section11/
form1/section12/

回答by Paul Zielinski

The AWS cli does this (presumably without fetching and iterating through all keys in the bucket) when you run aws s3 ls s3://my-bucket/, so I figured there must be a way using boto3.

当您运行时aws s3 ls s3://my-bucket/,AWS cli 会执行此操作(大概不会获取和迭代存储桶中的所有键),因此我认为必须有一种使用 boto3 的方法。

https://github.com/aws/aws-cli/blob/0fedc4c1b6a7aee13e2ed10c3ada778c702c22c3/awscli/customizations/s3/subcommands.py#L499

https://github.com/aws/aws-cli/blob/0fedc4c1b6a7aee13e2ed10c3ada778c702c22c3/awscli/customizations/s3/subcommands.py#L499

It looks like they indeed use Prefix and Delimiter - I was able to write a function that would get me all directories at the root level of a bucket by modifying that code a bit:

看起来他们确实使用了 Prefix 和 Delimiter - 我能够编写一个函数,通过稍微修改该代码来获取存储桶根级别的所有目录:

def list_folders_in_bucket(bucket):
    paginator = boto3.client('s3').get_paginator('list_objects')
    folders = []
    iterator = paginator.paginate(Bucket=bucket, Prefix='', Delimiter='/', PaginationConfig={'PageSize': None})
    for response_data in iterator:
        prefixes = response_data.get('CommonPrefixes', [])
        for prefix in prefixes:
            prefix_name = prefix['Prefix']
            if prefix_name.endswith('/'):
                folders.append(prefix_name.rstrip('/'))
    return folders

回答by Acumenus

Using boto3.resource

使用 boto3.resource

This builds upon the answer by itz-azharto apply an optional limit. It is obviously substantially simpler to use than the boto3.clientversion.

这建立在itz-azhar回答之上,以应用可选的limit. 它的使用显然比boto3.client版本简单得多。

import logging
from typing import List, Optional

import boto3
from boto3_type_annotations.s3 import ObjectSummary  # pip install boto3_type_annotations

log = logging.getLogger(__name__)
_S3_RESOURCE = boto3.resource("s3")

def s3_list(bucket_name: str, prefix: str, *, limit: Optional[int] = None) -> List[ObjectSummary]:
    """Return a list of S3 object summaries."""
    # Ref: https://stackoverflow.com/a/57718002/
    return list(_S3_RESOURCE.Bucket(bucket_name).objects.limit(count=limit).filter(Prefix=prefix))


if __name__ == "__main__":
    s3_list("noaa-gefs-pds", "gefs.20190828/12/pgrb2a", limit=10_000)

Using boto3.client

使用 boto3.client

This uses list_objects_v2and builds upon the answer by CpILLto allow retrieving more than 1000 objects.

这使用list_objects_v2并建立在CpILL答案之上,以允许检索 1000 多个对象。

import logging
from typing import cast, List

import boto3

log = logging.getLogger(__name__)
_S3_CLIENT = boto3.client("s3")

def s3_list(bucket_name: str, prefix: str, *, limit: int = cast(int, float("inf"))) -> List[dict]:
    """Return a list of S3 object summaries."""
    # Ref: https://stackoverflow.com/a/57718002/
    contents: List[dict] = []
    continuation_token = None
    if limit <= 0:
        return contents
    while True:
        max_keys = min(1000, limit - len(contents))
        request_kwargs = {"Bucket": bucket_name, "Prefix": prefix, "MaxKeys": max_keys}
        if continuation_token:
            log.info(  # type: ignore
                "Listing %s objects in s3://%s/%s using continuation token ending with %s with %s objects listed thus far.",
                max_keys, bucket_name, prefix, continuation_token[-6:], len(contents))  # pylint: disable=unsubscriptable-object
            response = _S3_CLIENT.list_objects_v2(**request_kwargs, ContinuationToken=continuation_token)
        else:
            log.info("Listing %s objects in s3://%s/%s with %s objects listed thus far.", max_keys, bucket_name, prefix, len(contents))
            response = _S3_CLIENT.list_objects_v2(**request_kwargs)
        assert response["ResponseMetadata"]["HTTPStatusCode"] == 200
        contents.extend(response["Contents"])
        is_truncated = response["IsTruncated"]
        if (not is_truncated) or (len(contents) >= limit):
            break
        continuation_token = response["NextContinuationToken"]
    assert len(contents) <= limit
    log.info("Returning %s objects from s3://%s/%s.", len(contents), bucket_name, prefix)
    return contents


if __name__ == "__main__":
    s3_list("noaa-gefs-pds", "gefs.20190828/12/pgrb2a", limit=10_000)

回答by Nathan Benton

I know that boto3 is the topic being discussed here, but I find that it is usually quicker and more intuitive to simply use awsclifor something like this - awscli retains more capabilities that boto3 for what than is worth.

我知道 boto3 是这里讨论的主题,但我发现简单地将awscli用于这样的事情通常更快、更直观- awscli 保留了 boto3 比价值更多的功能。

For example, if I have objects saved in "subfolders" associated with a given bucket, I can list them all out with something such as this:

例如,如果我将对象保存在与给定存储桶关联的“子文件夹”中,我可以使用以下内容将它们全部列出:

1) 'mydata' = bucket name

2) 'f1/f2/f3' = "path" leading to "files" or objects

3) 'foo2.csv, barfar.segy, gar.tar' = all objects "inside" f3

1) 'mydata' = 存储桶名称

2) 'f1/f2/f3' = 通向“文件”或对象的“路径”

3) 'foo2.csv, barfar.segy, gar.tar' = f3“内部”的所有对象

So, we can think of the "absolute path" leading to these objects is: 'mydata/f1/f2/f3/foo2.csv'...

所以,我们可以想到通向这些对象的“绝对路径”是:'mydata/f1/f2/f3/foo2.csv'...

Using awscli commands, we can easily list all objects inside a given "subfolder" via:

使用 awscli 命令,我们可以通过以下方式轻松列出给定“子文件夹”中的所有对象:

aws s3 ls s3://mydata/f1/f2/f3/ --recursive

aws s3 ls s3://mydata/f1/f2/f3/ --recursive