如何使用 python boto 获取 amazon S3 中仅文件夹的列表

Question

提问by user1958218

I am using boto and python and amazon s3.

我正在使用 boto 和 python 以及亚马逊 s3。

If i use

如果我使用

[key.name for key in list(self.bucket.list())]

then i get all the keys of all the files.

然后我得到了所有文件的所有密钥。

mybucket/files/pdf/abc.pdf
mybucket/files/pdf/abc2.pdf
mybucket/files/pdf/abc3.pdf
mybucket/files/pdf/abc4.pdf
mybucket/files/pdf/new/
mybucket/files/pdf/new/abc.pdf
mybucket/files/pdf/2011/

what is the best way to

什么是最好的方法

1. either get all folders from s3
2. or from that list just remove the file from the last and get the unique keys of folders

I am thinking of doing like this

我正在考虑这样做

set([re.sub("/[^/]*$","/",path) for path in mylist]

Answer 1

回答by j0nes

Basically there is no such thing as a folder in S3. Internally everything is stored as a key, and if the key name has a slash character in it, the clients may decide to show it as a folder.

S3 中基本上没有文件夹这样的东西。在内部，所有内容都存储为密钥，如果密钥名称中包含斜杠字符，客户端可能会决定将其显示为文件夹。

With that in mind, you should first get all keys and then use a regex to filter out the paths that include a slash in it. The solution you have right now is already a good start.

考虑到这一点，您应该首先获取所有键，然后使用正则表达式过滤掉包含斜杠的路径。您现在拥有的解决方案已经是一个好的开始。

Answer 2

回答by sethwm

This is going to be an incomplete answer since I don't know python or boto, but I want to comment on the underlying concept in the question.

这将是一个不完整的答案，因为我不知道 python 或 boto，但我想评论问题中的基本概念。

One of the other posters was right: there is no concept of a directory in S3. There are only flat key/value pairs. Many applications pretend certain delimiters indicate directory entries. For example "/" or "\". Some apps go as far as putting a dummy file in place so that if the "directory" empties out, you can still see it in list results.

另一张海报是对的：S3 中没有目录的概念。只有平面键/值对。许多应用程序假装某些分隔符指示目录条目。例如“/”或“\”。一些应用程序甚至会放置一个虚拟文件，以便如果“目录”清空，您仍然可以在列表结果中看到它。

You don't always have to pull your entire bucket down and do the filtering locally. S3 has a concept of a delimited list where you specific what you would deem your path delimiter ("/", "\", "|", "foobar", etc) and S3 will return virtual results to you, similar to what you want.

您不必总是将整个存储桶拉下来并在本地进行过滤。S3 有一个分隔列表的概念，您可以在其中指定您认为路径分隔符（“/”、“\”、“|”、“foobar”等）的内容，S3 将向您返回虚拟结果，类似于您想。

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html( Look at the delimiter header.)

http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html（查看分隔符标头。）

This API will get you one level of directories. So if you had in your example:

此 API 将为您提供一级目录。所以如果你在你的例子中有：

mybucket/files/pdf/abc.pdf
mybucket/files/pdf/abc2.pdf
mybucket/files/pdf/abc3.pdf
mybucket/files/pdf/abc4.pdf
mybucket/files/pdf/new/
mybucket/files/pdf/new/abc.pdf
mybucket/files/pdf/2011/

And you passed in a LIST with prefix "" and delimiter "/", you'd get results:

你传入一个带有前缀“”和分隔符“/”的列表，你会得到结果：

mybucket/files/

If you passed in a LIST with prefix "mybucket/files/" and delimiter "/", you'd get results:

如果您传入带有前缀“mybucket/files/”和分隔符“/”的 LIST，您将得到结果：

mybucket/files/pdf/

And if you passed in a LIST with prefix "mybucket/files/pdf/" and delimiter "/", you'd get results:

如果你传入一个带有前缀“mybucket/files/pdf/”和分隔符“/”的列表，你会得到结果：

mybucket/files/pdf/abc.pdf
mybucket/files/pdf/abc2.pdf
mybucket/files/pdf/abc3.pdf
mybucket/files/pdf/abc4.pdf
mybucket/files/pdf/new/
mybucket/files/pdf/2011/

You'd be on your own at that point if you wanted to eliminate the pdf files themselves from the result set.

如果您想从结果集中消除 pdf 文件本身，那您就得靠自己了。

Now how you do this in python/boto I have no idea. Hopefully there's a way to pass through.

现在你如何在 python/boto 中做到这一点我不知道。希望有办法通过。

Answer 3

回答by bambata

the boto interface allows you to list the content of a bucket and give a prefix of the entry. That way you can have the entry for what would be a directory in a normal filesytem :

boto 接口允许您列出存储桶的内容并给出条目的前缀。这样你就可以在普通文件系统中拥有一个目录的条目：

import boto
AWS_ACCESS_KEY_ID = '...'
AWS_SECRET_ACCESS_KEY = '...'

conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket()
bucket_entries = bucket.list(prefix='/path/to/your/directory')

for entry in bucket_entries:
    print entry

Answer 4

回答by j1m

building on sethwm's answer:

基于sethwm的回答：

To get the top level directories:

要获取顶级目录：

list(bucket.list("", "/"))

To get the subdirectories of files:

要获取的子目录files：

list(bucket.list("files/", "/")

and so on.

等等。

Answer 5

回答by Wawrzek

As pointed in one of the comments approach suggested by j1m returns a prefix object. If you are after a name/path you can use variable name. For example:

正如 j1m 建议的其中一种注释方法所指出的那样，返回一个前缀对象。如果您在名称/路径之后，您可以使用变量name。例如：

import boto
import boto.s3

conn = boto.s3.connect_to_region('us-west-2')
bucket = conn.get_bucket(your_bucket)

folders = bucket.list("","/")
for folder in folders:
    print folder.name

Answer 6

回答by Nathan Hazzard

The issue here, as has been said by others, is that a folder doesn't necessarily have a key, so you have to search through the strings for the / character and figure out your folders through that. Here's one way to generate a recursive dictionary imitating a folder structure.

正如其他人所说，这里的问题是文件夹不一定有密钥，因此您必须在字符串中搜索 / 字符并通过它找出您的文件夹。这是生成模仿文件夹结构的递归字典的一种方法。

If you want all the files and their url's in the folders

如果您想要文件夹中的所有文件及其网址

assets = {}
  for key in self.bucket.list(str(self.org) + '/'):
    path = key.name.split('/')

    identifier = assets
  for uri in path[1:-1]:
    try:
      identifier[uri]
    except:
      identifier[uri] = {}
    identifier = identifier[uri]

    if not key.name.endswith('/'):
      identifier[path[-1]] = key.generate_url(expires_in=0, query_auth=False)

return assets

If you just want the empty folders

如果你只想要空文件夹

folders = {}
  for key in self.bucket.list(str(self.org) + '/'):
    path = key.name.split('/')

    identifier = folders
  for uri in path[1:-1]:
    try:
      identifier[uri]
    except:
      identifier[uri] = {}
    identifier = identifier[uri]

    if key.name.endswith('/'):
      identifier[path[-1]] = {}

return folders

This can then be recursively read out later.

然后可以在以后递归地读出它。

Answer 7

回答by Erica Jh Lee

I see you have successfully made the boto connection. If you only have one directory that you are interested in (like you provided in the example), I think what you can do is use prefix and delimiter that's already provided via AWS (Link).

我看到您已成功建立 boto 连接。如果您只有一个您感兴趣的目录（就像您在示例中提供的那样），我认为您可以使用已经通过 AWS ( Link)提供的前缀和分隔符。

Boto uses this feature in its bucket object, and you can retrieve a hierarchical directory information using prefix and delimiter. The bucket.list() will return a boto.s3.bucketlistresultset.BucketListResultSetobject.

Boto 在其存储桶对象中使用此功能，您可以使用前缀和分隔符检索分层目录信息。bucket.list() 将返回一个boto.s3.bucketlistresultset.BucketListResultSet对象。

I tried this a couple ways, and if you do choose to use a delimiter=argument in bucket.list(), the returned object is an iterator for boto.s3.prefix.Prefix, rather than boto.s3.key.Key. In other words, if you try to retrieve the subdirectories you should put delimiter='\'and as a result, you will get an iterator for the prefixobject

我尝试了几种方法，如果您确实选择在中使用delimiter=参数bucket.list()，则返回的对象是的迭代器boto.s3.prefix.Prefix，而不是boto.s3.key.Key。换句话说，如果您尝试检索您应该放置的子目录delimiter='\'，结果，您将获得该prefix对象的迭代器

Both returned objects (either prefix or key object) have a .nameattribute, so if you want the directory/file information as a string, you can do so by printing like below:

两个返回的对象（前缀或键对象）都有一个.name属性，所以如果你想要目录/文件信息作为字符串，你可以通过如下打印来实现：

from boto.s3.connection import S3Connection

key_id = '...'
secret_key = '...'

# Create connection
conn = S3Connection(key_id, secret_key)

# Get list of all buckets
allbuckets = conn.get_all_buckets()
for bucket_name in allbuckets:
    print(bucket_name)

# Connet to a specific bucket
bucket = conn.get_bucket('bucket_name')

# Get subdirectory info
for key in bucket.list(prefix='sub_directory/', delimiter='/'):
    print(key.name)

Answer 8

回答by joeButler

Complete example with boto3 using the S3 client

使用 S3 客户端的 boto3 完整示例

import boto3


def list_bucket_keys(bucket_name):
    s3_client = boto3.client("s3")
    """ :type : pyboto3.s3 """
    result = s3_client.list_objects(Bucket=bucket_name, Prefix="Trails/", Delimiter="/")
    return result['CommonPrefixes']


if __name__ == '__main__':
    print list_bucket_keys("my-s3-bucket-name")

Answer 9

回答by Eduardo Sztokbant

I found the following to work using boto3:

我发现以下内容可以使用 boto3 工作：

def list_folders(s3_client, bucket_name):
    response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix='', Delimiter='/')
    for content in response.get('CommonPrefixes', []):
        yield content.get('Prefix')

s3_client = session.client('s3')
folder_list = list_folders(s3_client, bucket_name)
for folder in folder_list:
    print('Folder found: %s' % folder)

Refs.:

参考：

如何使用 python boto 获取 amazon S3 中仅文件夹的列表

提问by user1958218

回答by j0nes

回答by sethwm

回答by bambata

回答by j1m

回答by Wawrzek

回答by Nathan Hazzard

回答by Erica Jh Lee

回答by joeButler

回答by Eduardo Sztokbant

相关推荐

最近更新

标签

如何使用 python boto 获取 amazon S3 中仅文件夹的列表

提问by user1958218

回答by j0nes

回答by sethwm

回答by bambata

回答by j1m

回答by Wawrzek

回答by Nathan Hazzard

回答by Erica Jh Lee

回答by joeButler

回答by Eduardo Sztokbant

相关推荐

Python 数组旋转

Python 类型错误：描述符 'strftime' 需要一个 'datetime.date' 对象，但收到了一个 'Text'

在 Windows 上设置 Python simpleHTTPserver

Python 如何将字符串转换为数据框中的浮点值

相关推荐

最近更新

标签