如何使用 python boto 获取 amazon S3 中仅文件夹的列表
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17375127/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can i get list of only folders in amazon S3 using python boto
提问by user1958218
I am using boto and python and amazon s3.
我正在使用 boto 和 python 以及亚马逊 s3。
If i use
如果我使用
[key.name for key in list(self.bucket.list())]
[key.name for key in list(self.bucket.list())]
then i get all the keys of all the files.
然后我得到了所有文件的所有密钥。
mybucket/files/pdf/abc.pdf
mybucket/files/pdf/abc2.pdf
mybucket/files/pdf/abc3.pdf
mybucket/files/pdf/abc4.pdf
mybucket/files/pdf/new/
mybucket/files/pdf/new/abc.pdf
mybucket/files/pdf/2011/
what is the best way to
什么是最好的方法
1. either get all folders from s3
2. or from that list just remove the file from the last and get the unique keys of folders
I am thinking of doing like this
我正在考虑这样做
set([re.sub("/[^/]*$","/",path) for path in mylist]
回答by j0nes
Basically there is no such thing as a folder in S3. Internally everything is stored as a key, and if the key name has a slash character in it, the clients may decide to show it as a folder.
S3 中基本上没有文件夹这样的东西。在内部,所有内容都存储为密钥,如果密钥名称中包含斜杠字符,客户端可能会决定将其显示为文件夹。
With that in mind, you should first get all keys and then use a regex to filter out the paths that include a slash in it. The solution you have right now is already a good start.
考虑到这一点,您应该首先获取所有键,然后使用正则表达式过滤掉包含斜杠的路径。您现在拥有的解决方案已经是一个好的开始。
回答by sethwm
This is going to be an incomplete answer since I don't know python or boto, but I want to comment on the underlying concept in the question.
这将是一个不完整的答案,因为我不知道 python 或 boto,但我想评论问题中的基本概念。
One of the other posters was right: there is no concept of a directory in S3. There are only flat key/value pairs. Many applications pretend certain delimiters indicate directory entries. For example "/" or "\". Some apps go as far as putting a dummy file in place so that if the "directory" empties out, you can still see it in list results.
另一张海报是对的:S3 中没有目录的概念。只有平面键/值对。许多应用程序假装某些分隔符指示目录条目。例如“/”或“\”。一些应用程序甚至会放置一个虚拟文件,以便如果“目录”清空,您仍然可以在列表结果中看到它。
You don't always have to pull your entire bucket down and do the filtering locally. S3 has a concept of a delimited list where you specific what you would deem your path delimiter ("/", "\", "|", "foobar", etc) and S3 will return virtual results to you, similar to what you want.
您不必总是将整个存储桶拉下来并在本地进行过滤。S3 有一个分隔列表的概念,您可以在其中指定您认为路径分隔符(“/”、“\”、“|”、“foobar”等)的内容,S3 将向您返回虚拟结果,类似于您想。
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html( Look at the delimiter header.)
http://docs.aws.amazon.com/AmazonS3/latest/API/RESTBucketGET.html(查看分隔符标头。)
This API will get you one level of directories. So if you had in your example:
此 API 将为您提供一级目录。所以如果你在你的例子中有:
mybucket/files/pdf/abc.pdf
mybucket/files/pdf/abc2.pdf
mybucket/files/pdf/abc3.pdf
mybucket/files/pdf/abc4.pdf
mybucket/files/pdf/new/
mybucket/files/pdf/new/abc.pdf
mybucket/files/pdf/2011/
And you passed in a LIST with prefix "" and delimiter "/", you'd get results:
你传入一个带有前缀“”和分隔符“/”的列表,你会得到结果:
mybucket/files/
If you passed in a LIST with prefix "mybucket/files/" and delimiter "/", you'd get results:
如果您传入带有前缀“mybucket/files/”和分隔符“/”的 LIST,您将得到结果:
mybucket/files/pdf/
And if you passed in a LIST with prefix "mybucket/files/pdf/" and delimiter "/", you'd get results:
如果你传入一个带有前缀“mybucket/files/pdf/”和分隔符“/”的列表,你会得到结果:
mybucket/files/pdf/abc.pdf
mybucket/files/pdf/abc2.pdf
mybucket/files/pdf/abc3.pdf
mybucket/files/pdf/abc4.pdf
mybucket/files/pdf/new/
mybucket/files/pdf/2011/
You'd be on your own at that point if you wanted to eliminate the pdf files themselves from the result set.
如果您想从结果集中消除 pdf 文件本身,那您就得靠自己了。
Now how you do this in python/boto I have no idea. Hopefully there's a way to pass through.
现在你如何在 python/boto 中做到这一点我不知道。希望有办法通过。
回答by bambata
the boto interface allows you to list the content of a bucket and give a prefix of the entry. That way you can have the entry for what would be a directory in a normal filesytem :
boto 接口允许您列出存储桶的内容并给出条目的前缀。这样你就可以在普通文件系统中拥有一个目录的条目:
import boto
AWS_ACCESS_KEY_ID = '...'
AWS_SECRET_ACCESS_KEY = '...'
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket()
bucket_entries = bucket.list(prefix='/path/to/your/directory')
for entry in bucket_entries:
print entry
回答by j1m
building on sethwm's answer:
基于sethwm的回答:
To get the top level directories:
要获取顶级目录:
list(bucket.list("", "/"))
To get the subdirectories of files
:
要获取 的子目录files
:
list(bucket.list("files/", "/")
and so on.
等等。
回答by Wawrzek
As pointed in one of the comments approach suggested by j1m returns a prefix object. If you are after a name/path you can use variable name. For example:
正如 j1m 建议的其中一种注释方法所指出的那样,返回一个前缀对象。如果您在名称/路径之后,您可以使用变量name。例如:
import boto
import boto.s3
conn = boto.s3.connect_to_region('us-west-2')
bucket = conn.get_bucket(your_bucket)
folders = bucket.list("","/")
for folder in folders:
print folder.name
回答by Nathan Hazzard
The issue here, as has been said by others, is that a folder doesn't necessarily have a key, so you have to search through the strings for the / character and figure out your folders through that. Here's one way to generate a recursive dictionary imitating a folder structure.
正如其他人所说,这里的问题是文件夹不一定有密钥,因此您必须在字符串中搜索 / 字符并通过它找出您的文件夹。这是生成模仿文件夹结构的递归字典的一种方法。
If you want all the files and their url's in the folders
如果您想要文件夹中的所有文件及其网址
assets = {}
for key in self.bucket.list(str(self.org) + '/'):
path = key.name.split('/')
identifier = assets
for uri in path[1:-1]:
try:
identifier[uri]
except:
identifier[uri] = {}
identifier = identifier[uri]
if not key.name.endswith('/'):
identifier[path[-1]] = key.generate_url(expires_in=0, query_auth=False)
return assets
If you just want the empty folders
如果你只想要空文件夹
folders = {}
for key in self.bucket.list(str(self.org) + '/'):
path = key.name.split('/')
identifier = folders
for uri in path[1:-1]:
try:
identifier[uri]
except:
identifier[uri] = {}
identifier = identifier[uri]
if key.name.endswith('/'):
identifier[path[-1]] = {}
return folders
This can then be recursively read out later.
然后可以在以后递归地读出它。
回答by Erica Jh Lee
I see you have successfully made the boto connection. If you only have one directory that you are interested in (like you provided in the example), I think what you can do is use prefix and delimiter that's already provided via AWS (Link).
我看到您已成功建立 boto 连接。如果您只有一个您感兴趣的目录(就像您在示例中提供的那样),我认为您可以使用已经通过 AWS ( Link)提供的前缀和分隔符。
Boto uses this feature in its bucket object, and you can retrieve a hierarchical directory information using prefix and delimiter. The bucket.list() will return a boto.s3.bucketlistresultset.BucketListResultSet
object.
Boto 在其存储桶对象中使用此功能,您可以使用前缀和分隔符检索分层目录信息。bucket.list() 将返回一个boto.s3.bucketlistresultset.BucketListResultSet
对象。
I tried this a couple ways, and if you do choose to use a delimiter=
argument in bucket.list()
, the returned object is an iterator for boto.s3.prefix.Prefix
, rather than boto.s3.key.Key
. In other words, if you try to retrieve the subdirectories you should put delimiter='\'
and as a result, you will get an iterator for the prefix
object
我尝试了几种方法,如果您确实选择在 中使用delimiter=
参数bucket.list()
,则返回的对象是 的迭代器boto.s3.prefix.Prefix
,而不是boto.s3.key.Key
。换句话说,如果您尝试检索您应该放置的子目录delimiter='\'
,结果,您将获得该prefix
对象的迭代器
Both returned objects (either prefix or key object) have a .name
attribute, so if you want the directory/file information as a string, you can do so by printing like below:
两个返回的对象(前缀或键对象)都有一个.name
属性,所以如果你想要目录/文件信息作为字符串,你可以通过如下打印来实现:
from boto.s3.connection import S3Connection
key_id = '...'
secret_key = '...'
# Create connection
conn = S3Connection(key_id, secret_key)
# Get list of all buckets
allbuckets = conn.get_all_buckets()
for bucket_name in allbuckets:
print(bucket_name)
# Connet to a specific bucket
bucket = conn.get_bucket('bucket_name')
# Get subdirectory info
for key in bucket.list(prefix='sub_directory/', delimiter='/'):
print(key.name)
回答by joeButler
Complete example with boto3 using the S3 client
使用 S3 客户端的 boto3 完整示例
import boto3
def list_bucket_keys(bucket_name):
s3_client = boto3.client("s3")
""" :type : pyboto3.s3 """
result = s3_client.list_objects(Bucket=bucket_name, Prefix="Trails/", Delimiter="/")
return result['CommonPrefixes']
if __name__ == '__main__':
print list_bucket_keys("my-s3-bucket-name")
回答by Eduardo Sztokbant
I found the following to work using boto3:
我发现以下内容可以使用 boto3 工作:
def list_folders(s3_client, bucket_name):
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix='', Delimiter='/')
for content in response.get('CommonPrefixes', []):
yield content.get('Prefix')
s3_client = session.client('s3')
folder_list = list_folders(s3_client, bucket_name)
for folder in folder_list:
print('Folder found: %s' % folder)
Refs.:
参考: