Python boto,列出bucket中特定目录的内容

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27292145/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:37:15  来源:igfitidea点击:

Python boto, list contents of specific dir in bucket

pythonamazon-s3boto

提问by Martin Taleski

I have S3 access only to a specific directory in an S3 bucket.

我只能通过 S3 访问 S3 存储桶中的特定目录。

For example, with the s3cmdcommand if I try to list the whole bucket:

例如,s3cmd如果我尝试列出整个存储桶,则使用以下命令:

    $ s3cmd ls s3://my-bucket-url

I get an error: Access to bucket 'my-bucket-url' was denied

我收到一个错误: Access to bucket 'my-bucket-url' was denied

But if I try access a specific dir in the bucket, I can see the contents:

但是如果我尝试访问存储桶中的特定目录,我可以看到内容:

    $ s3cmd ls s3://my-bucket-url/dir-in-bucket

Now I want to connect to the S3 bucket with python boto. Similary with:

现在我想用 python boto 连接到 S3 存储桶。类似于:

    bucket = conn.get_bucket('my-bucket-url')

I get an error: boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden

我收到一个错误: boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden

But if I try:

但如果我尝试:

    bucket = conn.get_bucket('my-bucket-url/dir-in-bucket')

The script stalls for about 10 seconds, and prints out an error afterwards. Bellow is the full trace. Any idea how to proceed with this?

脚本停止大约 10 秒,然后打印出错误。波纹管是完整的痕迹。知道如何继续吗?

Traceback (most recent call last):
  File "test_s3.py", line 7, in <module>
    bucket = conn.get_bucket('my-bucket-url/dir-name')
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 471, in get_bucket
    return self.head_bucket(bucket_name, headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 490, in head_bucket
    response = self.make_request('HEAD', bucket_name, headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 633, in make_request
    retry_handler=retry_handler
  File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1046, in make_request
    retry_handler=retry_handler)
  File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 922, in _mexe
    request.body, request.headers)
  File "/usr/lib/python2.7/httplib.py", line 958, in request
    self._send_request(method, url, body, headers)
  File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
    self.endheaders(body)
  File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 776, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 1157, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 553, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno -2] Name or service not known

采纳答案by garnaat

By default, when you do a get_bucketcall in boto it tries to validate that you actually have access to that bucket by performing a HEADrequest on the bucket URL. In this case, you don't want boto to do that since you don't have access to the bucket itself. So, do this:

默认情况下,当您get_bucket在 boto 中进行调用时,它会尝试通过HEAD对存储桶 URL执行请求来验证您是否确实有权访问该存储桶。在这种情况下,您不希望 boto 这样做,因为您无权访问存储桶本身。所以,这样做:

bucket = conn.get_bucket('my-bucket-url', validate=False)

and then you should be able to do something like this to list objects:

然后你应该能够做这样的事情来列出对象:

for key in bucket.list(prefix='dir-in-bucket'): 
    <do something>

If you still get a 403 Errror, try adding a slash at the end of the prefix.

如果仍然出现 403 错误,请尝试在前缀末尾添加斜杠。

for key in bucket.list(prefix='dir-in-bucket/'): 
    <do something>

回答by reetesh11

If you want to list all the objects of a folder in your bucket, you can specify it while listing.

如果要列出存储桶中某个文件夹的所有对象,可以在列出时指定。

import boto
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(AWS_BUCKET_NAME)
for file in bucket.list("FOLDER_NAME/", "/"):
    <do something with required file>

回答by M.Vanderlee

For boto3

对于 boto3

import boto3

s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_bucket_name')

for object_summary in my_bucket.objects.filter(Prefix="dir_name/"):
    print(object_summary.key)

回答by gogasca

Boto3 client:

Boto3 客户端:

import boto3

_BUCKET_NAME = 'mybucket'
_PREFIX = 'subfolder/'

client = boto3.client('s3', aws_access_key_id=ACCESS_KEY,
                            aws_secret_access_key=SECRET_KEY)

def ListFiles(client):
    """List files in specific S3 URL"""
    response = client.list_objects(Bucket=_BUCKET_NAME, Prefix=_PREFIX)
    for content in response.get('Contents', []):
        yield content.get('Key')

file_list = ListFiles(client)
for file in file_list:
    print 'File found: %s' % file

Using session

使用会话

from boto3.session import Session

_BUCKET_NAME = 'mybucket'
_PREFIX = 'subfolder/'

session = Session(aws_access_key_id=ACCESS_KEY,
                  aws_secret_access_key=SECRET_KEY)

client = session.client('s3')

def ListFilesV1(client, bucket, prefix=''):
    """List files in specific S3 URL"""
    paginator = client.get_paginator('list_objects')
    for result in paginator.paginate(Bucket=bucket, Prefix=prefix,
                                     Delimiter='/'):
        for content in result.get('Contents', []):
            yield content.get('Key')

file_list = ListFilesV1(client, _BUCKET_NAME, prefix=_PREFIX)
for file in file_list:
    print 'File found: %s' % file

回答by Nandeesh

The following code will list all the files in specific dir of the S3 bucket:

以下代码将列出 S3 存储桶特定目录中的所有文件:

import boto3

s3 = boto3.client('s3')

def get_all_s3_keys(s3_path):
    """
    Get a list of all keys in an S3 bucket.

    :param s3_path: Path of S3 dir.
    """
    keys = []

    if not s3_path.startswith('s3://'):
        s3_path = 's3://' + s3_path

    bucket = s3_path.split('//')[1].split('/')[0]
    prefix = '/'.join(s3_path.split('//')[1].split('/')[1:])

    kwargs = {'Bucket': bucket, 'Prefix': prefix}
    while True:
        resp = s3.list_objects_v2(**kwargs)
        for obj in resp['Contents']:
            keys.append(obj['Key'])

        try:
            kwargs['ContinuationToken'] = resp['NextContinuationToken']
        except KeyError:
            break

    return keys