Python s3 urls - 获取存储桶名称和路径

Question

提问by Lijju Mathew

I have a variable which has the aws s3 url

我有一个变量，它有 aws s3 url

s3://bucket_name/folder1/folder2/file1.json

I want to get the bucket_name in a variables and rest i.e /folder1/folder2/file1.json in another variable. I tried the regular expressions and could get the bucket_name like below, not sure if there is a better way.

我想在一个变量中获取 bucket_name 并在另一个变量中休息，即 /folder1/folder2/file1.json。我尝试了正则表达式，可以得到如下所示的bucket_name，不确定是否有更好的方法。

m = re.search('(?<=s3:\/\/)[^\/]+', 's3://bucket_name/folder1/folder2/file1.json')
print(m.group(0))

How do I get the rest i.e - folder1/folder2/file1.json ?

我如何获得其余的 ie - folder1/folder2/file1.json ？

I have checked if there is a boto3 feature to extract the bucket_name and key from the url, but couldn't find it.

我检查了是否有 boto3 功能可以从 url 中提取 bucket_name 和密钥，但找不到它。

Answer 1

回答by kichik

Since it's just a normal URL, you can use urlparseto get all the parts of the URL.

由于它只是一个普通的 URL，您可以使用urlparse获取 URL 的所有部分。

>>> from urlparse import urlparse
>>> o = urlparse('s3://bucket_name/folder1/folder2/file1.json', allow_fragments=False)
>>> o
ParseResult(scheme='s3', netloc='bucket_name', path='/folder1/folder2/file1.json', params='', query='', fragment='')
>>> o.netloc
'bucket_name'
>>> o.path
'/folder1/folder2/file1.json'

You may have to remove the beginning slash from the key as the next answer suggests.

您可能必须按照下一个答案的建议从键中删除开头的斜杠。

o.path.lstrip('/')

With Python 3 urlparsemoved to urllib.parseso use:

随着 Python 3urlparse移动到urllib.parse所以使用：

from urllib.parse import urlparse

Here's a class that takes care of all the details.

这是一个处理所有细节的类。

try:
    from urlparse import urlparse
except ImportError:
    from urllib.parse import urlparse


class S3Url(object):
    """
    >>> s = S3Url("s3://bucket/hello/world")
    >>> s.bucket
    'bucket'
    >>> s.key
    'hello/world'
    >>> s.url
    's3://bucket/hello/world'

    >>> s = S3Url("s3://bucket/hello/world?qwe1=3#ddd")
    >>> s.bucket
    'bucket'
    >>> s.key
    'hello/world?qwe1=3#ddd'
    >>> s.url
    's3://bucket/hello/world?qwe1=3#ddd'

    >>> s = S3Url("s3://bucket/hello/world#foo?bar=2")
    >>> s.key
    'hello/world#foo?bar=2'
    >>> s.url
    's3://bucket/hello/world#foo?bar=2'
    """

    def __init__(self, url):
        self._parsed = urlparse(url, allow_fragments=False)

    @property
    def bucket(self):
        return self._parsed.netloc

    @property
    def key(self):
        if self._parsed.query:
            return self._parsed.path.lstrip('/') + '?' + self._parsed.query
        else:
            return self._parsed.path.lstrip('/')

    @property
    def url(self):
        return self._parsed.geturl()

Answer 2

回答by Mikhail Sirotenko

For those who like me was trying to use urlparse to extract key and bucket in order to create object with boto3. There's one important detail: remove slash from the beginning of the key

对于那些像我一样试图使用 urlparse 提取密钥和存储桶以便使用 boto3 创建对象的人。有一个重要的细节：删除键开头的斜线

from urlparse import urlparse
o = urlparse('s3://bucket_name/folder1/folder2/file1.json')
bucket = o.netloc
key = o.path
boto3.client('s3')
client.put_object(Body='test', Bucket=bucket, Key=key.lstrip('/'))

It took a while to realize that because boto3 doesn't throw any exception.

花了一段时间才意识到，因为 boto3 没有抛出任何异常。

Answer 3

回答by mikeviescas

A solution that works without urllib or re (also handles preceding slash):

无需 urllib 或 re 的解决方案（也处理前面的斜杠）：

def split_s3_path(s3_path):
    path_parts=s3_path.replace("s3://","").split("/")
    bucket=path_parts.pop(0)
    key="/".join(path_parts)
    return bucket, key

To run:

跑步：

bucket, key = split_s3_path("s3://my-bucket/some_folder/another_folder/my_file.txt")

Returns:

返回：

bucket: my-bucket
key: some_folder/another_folder/my_file.txt

Answer 4

回答by Alec Hewitt

If you want to do it with regular expressions, you can do the following:

如果你想用正则表达式来做，你可以执行以下操作：

>>> import re
>>> uri = 's3://my-bucket/my-folder/my-object.png'
>>> match = re.match(r's3:\/\/(.+?)\/(.+)', uri)
>>> match.group(1)
'my-bucket'
>>> match.group(2)
'my-folder/my-object.png'

This has the advantage that you can check for the s3scheme rather than allowing anything there.

这样做的好处是您可以检查s3方案而不是允许任何内容。

Answer 5

回答by David

Here it is as a one-liner using regex:

这是使用正则表达式的单行：

import re

s3_path = "s3://bucket/path/to/key"

bucket, key = re.match(r"s3:\/\/(.+?)\/(.+)", s3_path).groups()

Answer 6

回答by Lior Mizrahi

This is a nice project:

这是一个不错的项目：

s3pathis a pathlib extention for aws s3 service

s3path是 aws s3 服务的 pathlib 扩展

>>> from s3path import S3Path
>>> path = S3Path.from_uri('s3://bucket_name/folder1/folder2/file1.json')
>>> print(path.bucket)
'/bucket_name'
>>> print(path.key)
'folder1/folder2/file1.json'
>>> print(list(path.key.parents))
[S3Path('folder1/folder2'), S3Path('folder1'), S3Path('.')]

Python s3 urls - 获取存储桶名称和路径

提问by Lijju Mathew

回答by kichik

回答by Mikhail Sirotenko

回答by mikeviescas

回答by Alec Hewitt

回答by David

回答by Lior Mizrahi

相关推荐

最近更新

标签

Python s3 urls - 获取存储桶名称和路径

提问by Lijju Mathew

回答by kichik

回答by Mikhail Sirotenko

回答by mikeviescas

回答by Alec Hewitt

回答by David

回答by Lior Mizrahi

相关推荐

Python Pickle：TypeError：需要一个类似字节的对象，而不是“str”

Python 将列表转换为熊猫数据框

如何使用 python 连接到 SQL 服务器数据库？

Python Keras 错误：预计会看到 1 个数组

相关推荐

最近更新

标签