使用 Python 和 Boto3 列出 S3 存储桶的目录内容?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32635785/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
List directory contents of an S3 bucket using Python and Boto3?
提问by Allen Gooch
I am trying to list all directories within an S3 bucket using Python and Boto3.
我正在尝试使用 Python 和 Boto3 列出 S3 存储桶中的所有目录。
I am using the following code:
我正在使用以下代码:
s3 = session.resource('s3') # I already have a boto3 Session object
bucket_names = [
'this/bucket/',
'that/bucket/'
]
for name in bucket_names:
bucket = s3.Bucket(name)
for obj in bucket.objects.all(): # this raises an exception
# handle obj
When I run this I get the following exception stack trace:
当我运行它时,我得到以下异常堆栈跟踪:
File "botolist.py", line 67, in <module>
for obj in bucket.objects.all():
File "/Library/Python/2.7/site-packages/boto3/resources/collection.py", line 82, in __iter__
for page in self.pages():
File "/Library/Python/2.7/site-packages/boto3/resources/collection.py", line 165, in pages
for page in pages:
File "/Library/Python/2.7/site-packages/botocore/paginate.py", line 83, in __iter__
response = self._make_request(current_kwargs)
File "/Library/Python/2.7/site-packages/botocore/paginate.py", line 155, in _make_request
return self._method(**current_kwargs)
File "/Library/Python/2.7/site-packages/botocore/client.py", line 270, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/Library/Python/2.7/site-packages/botocore/client.py", line 335, in _make_api_call
raise ClientError(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (NoSuchKey) when calling the ListObjects operation: The specified key does not exist.
What is the correct way to list directories inside a bucket?
列出存储桶内目录的正确方法是什么?
采纳答案by Henry Henrinson
All these other responses suck. Using
所有这些其他反应都很糟糕。使用
client.list_objects()
Limits you to 1k results max. The rest of the answers are either wrong or too complex.
将您限制为最多 1k 个结果。其余的答案要么错误,要么过于复杂。
Dealing with the continuation token yourself is a terrible idea. Just use paginator, which deals with that logic for you
自己处理延续令牌是一个糟糕的主意。只需使用分页器,它会为您处理该逻辑
The solution you want is:
你想要的解决方案是:
[e['Key'] for p in client.get_paginator("list_objects_v2")\
.paginate(Bucket='my_bucket')
for e in p['Contents']]
回答by Vor
Alternatively you may want to use boto3.client
或者,您可能想使用 boto3.client
Example
例子
>>> import boto3
>>> client = boto3.client('s3')
>>> client.list_objects(Bucket='MyBucket')
list_objects
also supports other arguments that might be required to iterate though the result: Bucket, Delimiter, EncodingType, Marker, MaxKeys, Prefix
list_objects
还支持迭代结果可能需要的其他参数:Bucket、Delimiter、EncodingType、Marker、MaxKeys、Prefix
回答by Anne M.
If you have the session, create a client and get the CommonPrefixes
of the clients list_objects
:
如果您有会话,请创建一个客户端并获取CommonPrefixes
客户端的list_objects
:
client = session.client('s3',
# region_name='eu-west-1'
)
result = client.list_objects(Bucket='MyBucket', Delimiter='/')
for obj in result.get('CommonPrefixes'):
#handle obj.get('Prefix')
There could be a lot of folders, and you might want to start in a subfolder, though. Something like this could handle that:
可能有很多文件夹,但您可能希望从一个子文件夹开始。这样的事情可以处理:
def folders(client, bucket, prefix=''):
paginator = client.get_paginator('list_objects')
for result in paginator.paginate(Bucket=bucket, Prefix=prefix, Delimiter='/'):
for prefix in result.get('CommonPrefixes', []):
yield prefix.get('Prefix')
gen_folders = folders(client, 'MyBucket')
list(gen_folders)
gen_subfolders = folders(client, 'MyBucket', prefix='MySubFolder/')
list(gen_subfolders)
回答by Old_Mortality
I would have thought that you can not have a slash in a bucket name. You say you want to list all directories within a bucket, but your code attempts to list all contents (not necessarily directories) within a number of buckets. These buckets probably do not exist (because they have illegal names). So when you run
我原以为存储桶名称中不能有斜杠。您说要列出存储桶中的所有目录,但您的代码尝试列出多个存储桶中的所有内容(不一定是目录)。这些存储桶可能不存在(因为它们具有非法名称)。所以当你跑
bucket = s3.Bucket(name)
bucket is probably null, and the subsequent list will fail.
桶可能为空,随后的列表将失败。
回答by Behrooz
The best way to get the list of ALL objects with a specific prefix in a S3 bucket is using list_objects_v2
along with ContinuationToken
to overcome the 1000 object pagination limit.
获取 S3 存储桶中具有特定前缀的所有对象列表的最佳方法是使用list_objects_v2
withContinuationToken
来克服 1000 个对象分页限制。
import boto3
s3 = boto3.client('s3')
s3_bucket = 'your-bucket'
s3_prefix = 'your/prefix'
partial_list = s3.list_objects_v2(
Bucket=s3_bucket,
Prefix=s3_prefix)
obj_list = partial_list['Contents']
while partial_list['IsTruncated']:
next_token = partial_list['NextContinuationToken']
partial_list = s3.list_objects_v2(
Bucket=s3_bucket,
Prefix=s3_prefix,
ContinuationToken=next_token)
obj_list.extend(partial_list['Contents'])
回答by Toby
If you have fewer than 1,000 objects in your folder you can use the following code:
如果文件夹中的对象少于 1,000 个,则可以使用以下代码:
import boto3
s3 = boto3.client('s3')
object_listing = s3.list_objects_v2(Bucket='bucket_name',
Prefix='folder/sub-folder/')