Python s3 urls - 获取存储桶名称和路径

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/42641315/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:59:16  来源:igfitidea点击:

s3 urls - get bucket name and path

pythonboto3

提问by Lijju Mathew

I have a variable which has the aws s3 url

我有一个变量,它有 aws s3 url

s3://bucket_name/folder1/folder2/file1.json

I want to get the bucket_name in a variables and rest i.e /folder1/folder2/file1.json in another variable. I tried the regular expressions and could get the bucket_name like below, not sure if there is a better way.

我想在一个变量中获取 bucket_name 并在另一个变量中休息,即 /folder1/folder2/file1.json。我尝试了正则表达式,可以得到如下所示的bucket_name,不确定是否有更好的方法。

m = re.search('(?<=s3:\/\/)[^\/]+', 's3://bucket_name/folder1/folder2/file1.json')
print(m.group(0))

How do I get the rest i.e - folder1/folder2/file1.json ?

我如何获得其余的 ie - folder1/folder2/file1.json ?

I have checked if there is a boto3 feature to extract the bucket_name and key from the url, but couldn't find it.

我检查了是否有 boto3 功能可以从 url 中提取 bucket_name 和密钥,但找不到它。

回答by kichik

Since it's just a normal URL, you can use urlparseto get all the parts of the URL.

由于它只是一个普通的 URL,您可以使用urlparse获取 URL 的所有部分。

>>> from urlparse import urlparse
>>> o = urlparse('s3://bucket_name/folder1/folder2/file1.json', allow_fragments=False)
>>> o
ParseResult(scheme='s3', netloc='bucket_name', path='/folder1/folder2/file1.json', params='', query='', fragment='')
>>> o.netloc
'bucket_name'
>>> o.path
'/folder1/folder2/file1.json'

You may have to remove the beginning slash from the key as the next answer suggests.

您可能必须按照下一个答案的建议从键中删除开头的斜杠。

o.path.lstrip('/')

With Python 3 urlparsemoved to urllib.parseso use:

随着 Python 3urlparse移动到urllib.parse所以使用:

from urllib.parse import urlparse


Here's a class that takes care of all the details.

这是一个处理所有细节的类。

try:
    from urlparse import urlparse
except ImportError:
    from urllib.parse import urlparse


class S3Url(object):
    """
    >>> s = S3Url("s3://bucket/hello/world")
    >>> s.bucket
    'bucket'
    >>> s.key
    'hello/world'
    >>> s.url
    's3://bucket/hello/world'

    >>> s = S3Url("s3://bucket/hello/world?qwe1=3#ddd")
    >>> s.bucket
    'bucket'
    >>> s.key
    'hello/world?qwe1=3#ddd'
    >>> s.url
    's3://bucket/hello/world?qwe1=3#ddd'

    >>> s = S3Url("s3://bucket/hello/world#foo?bar=2")
    >>> s.key
    'hello/world#foo?bar=2'
    >>> s.url
    's3://bucket/hello/world#foo?bar=2'
    """

    def __init__(self, url):
        self._parsed = urlparse(url, allow_fragments=False)

    @property
    def bucket(self):
        return self._parsed.netloc

    @property
    def key(self):
        if self._parsed.query:
            return self._parsed.path.lstrip('/') + '?' + self._parsed.query
        else:
            return self._parsed.path.lstrip('/')

    @property
    def url(self):
        return self._parsed.geturl()

回答by Mikhail Sirotenko

For those who like me was trying to use urlparse to extract key and bucket in order to create object with boto3. There's one important detail: remove slash from the beginning of the key

对于那些像我一样试图使用 urlparse 提取密钥和存储桶以便使用 boto3 创建对象的人。有一个重要的细节:删除键开头的斜线

from urlparse import urlparse
o = urlparse('s3://bucket_name/folder1/folder2/file1.json')
bucket = o.netloc
key = o.path
boto3.client('s3')
client.put_object(Body='test', Bucket=bucket, Key=key.lstrip('/'))

It took a while to realize that because boto3 doesn't throw any exception.

花了一段时间才意识到,因为 boto3 没有抛出任何异常。

回答by mikeviescas

A solution that works without urllib or re (also handles preceding slash):

无需 urllib 或 re 的解决方案(也处理前面的斜杠):

def split_s3_path(s3_path):
    path_parts=s3_path.replace("s3://","").split("/")
    bucket=path_parts.pop(0)
    key="/".join(path_parts)
    return bucket, key

To run:

跑步:

bucket, key = split_s3_path("s3://my-bucket/some_folder/another_folder/my_file.txt")

Returns:

返回:

bucket: my-bucket
key: some_folder/another_folder/my_file.txt

回答by Alec Hewitt

If you want to do it with regular expressions, you can do the following:

如果你想用正则表达式来做,你可以执行以下操作:

>>> import re
>>> uri = 's3://my-bucket/my-folder/my-object.png'
>>> match = re.match(r's3:\/\/(.+?)\/(.+)', uri)
>>> match.group(1)
'my-bucket'
>>> match.group(2)
'my-folder/my-object.png'

This has the advantage that you can check for the s3scheme rather than allowing anything there.

这样做的好处是您可以检查s3方案而不是允许任何内容。

回答by David

Here it is as a one-liner using regex:

这是使用正则表达式的单行:

import re

s3_path = "s3://bucket/path/to/key"

bucket, key = re.match(r"s3:\/\/(.+?)\/(.+)", s3_path).groups()

回答by Lior Mizrahi

This is a nice project:

这是一个不错的项目:

s3pathis a pathlib extention for aws s3 service

s3path是 aws s3 服务的 pathlib 扩展

>>> from s3path import S3Path
>>> path = S3Path.from_uri('s3://bucket_name/folder1/folder2/file1.json')
>>> print(path.bucket)
'/bucket_name'
>>> print(path.key)
'folder1/folder2/file1.json'
>>> print(list(path.key.parents))
[S3Path('folder1/folder2'), S3Path('folder1'), S3Path('.')]