Python 如何将 S3 存储桶中的图像文件直接读取到内存中?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44043036/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 23:39:46  来源:igfitidea点击:

How to read image file from S3 bucket directly into memory?

pythonmatplotlibamazon-s3boto3

提问by Dims

I have the following code

我有以下代码

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import boto3
s3 = boto3.resource('s3', region_name='us-east-2')
bucket = s3.Bucket('sentinel-s2-l1c')
object = bucket.Object('tiles/10/S/DG/2015/12/7/0/B01.jp2')
object.download_file('B01.jp2')
img=mpimg.imread('B01.jp2')
imgplot = plt.imshow(img)
plt.show(imgplot)

and it works. But the problem it downloads file into current directory first. Is it possible to read file and decode it as image directly in RAM?

它有效。但是它首先将文件下载到当前目录的问题。是否可以直接在 RAM 中读取文件并将其解码为图像?

回答by Greg Merritt

I would suggest using io moduleto read the file directly in to memory, without having to use a temporary file at all.

我建议使用io 模块将文件直接读入内存,而根本不必使用临时文件。

For example:

例如:

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import boto3
import io

s3 = boto3.resource('s3', region_name='us-east-2')
bucket = s3.Bucket('sentinel-s2-l1c')
object = bucket.Object('tiles/10/S/DG/2015/12/7/0/B01.jp2')

file_stream = io.StringIO()
object.download_fileobj(file_stream)
img = mpimg.imread(file_stream)
# whatever you need to do

You could also use io.BytesIOif your data is binary.

io.BytesIO如果您的数据是二进制的,您也可以使用。

回答by Hyeungshik Jung

Greg Merritt's answer below is better method.

下面 Greg Merritt 的回答是更好的方法。

I'd like to suggest using Python's NamedTemporaryFilein tempfilemodule. It creates temporary files that will be deleted as file is closed (Thanks to @NoamG)

我想建议在模块中使用 Python 的NamedTemporaryFiletempfile。它创建将在文件关闭时删除的临时文件(感谢@NoamG)

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import boto3
import tempfile

s3 = boto3.resource('s3', region_name='us-east-2')
bucket = s3.Bucket('sentinel-s2-l1c')
object = bucket.Object('tiles/10/S/DG/2015/12/7/0/B01.jp2')
tmp = tempfile.NamedTemporaryFile()

with open(tmp.name, 'wb') as f:
    object.download_fileobj(f)
    img=mpimg.imread(tmp.name)
    # ...Do jobs using img

回答by Adrian Tofting

Streaming the image is possible by specifying the file format in imread().

通过在imread().

import boto3
from io import BytesIO
import matplotlib.image as mpimg
import matplotlib.pyplot as plt

resource = boto3.resource('s3', region_name='us-east-2')
bucket = resource.Bucket('sentinel-s2-l1c')

image_object = bucket.Object('tiles/10/S/DG/2015/12/7/0/B01.jp2')
image = mpimg.imread(BytesIO(image_object.get()['Body'].read()), 'jp2')

plt.figure(0)
plt.imshow(image)

回答by beahacker

Further development from Greg Merritt's answer to solve all errors in the comment section, using BytesIOinstead of StringIO, using PIL Imageinstead of matplotlib.image.

从 Greg Merritt 的回答中进一步发展以解决评论部分中的所有错误,使用BytesIO代替StringIO,使用 PILImage代替matplotlib.image

The following function works for python3and boto3. Similarly, write_image_to_s3function is a bonus.

以下函数适用于python3boto3。同样,write_image_to_s3功能也是一种奖励。

from PIL import Image
from io import BytesIO
import numpy as np

def read_image_from_s3(bucket, key, region_name='ap-southeast-1'):
    """Load image file from s3.

    Parameters
    ----------
    bucket: string
        Bucket name
    key : string
        Path in s3

    Returns
    -------
    np array
        Image array
    """
    s3 = boto3.resource('s3', region_name='ap-southeast-1')
    bucket = s3.Bucket(bucket)
    object = bucket.Object(key)
    response = object.get()
    file_stream = response['Body']
    im = Image.open(file_stream)
    return np.array(im)

def write_image_to_s3(img_array, bucket, key, region_name='ap-southeast-1'):
    """Write an image array into S3 bucket

    Parameters
    ----------
    bucket: string
        Bucket name
    key : string
        Path in s3

    Returns
    -------
    None
    """
    s3 = boto3.resource('s3', region_name)
    bucket = s3.Bucket(bucket)
    object = bucket.Object(key)
    file_stream = BytesIO()
    im = Image.fromarray(img_array)
    im.save(file_stream, format='jpeg')
    object.put(Body=file_stream.getvalue())

回答by Evgeniy

object = bucket.Object('tiles/10/S/DG/2015/12/7/0/B01.jp2')
img_data = object.get().get('Body').read()

回答by Kai

The temporary file solution by Hyeungshik Jung looks good, but I noticed that the file somehow seem to be downloaded in a lazy fashion. This leads to a behavior that if you call img.shape()and you'll get an empty dimension tuple as a return value ()even after you called object.download_fileobj(f). I resolved this issue by applying a f.seek(0,2)to the file descriptor - then all following operations work properly, e.g. returning all proper dimensions (704, 1024).

Hyeungshik Jung 的临时文件解决方案看起来不错,但我注意到该文件似乎以一种懒惰的方式下载。这会导致一种行为,如果您调用img.shape()并且()即使在您调用object.download_fileobj(f). 我通过将 af.seek(0,2)应用于文件描述符解决了这个问题- 然后所有以下操作正常工作,例如返回所有正确的维度(704, 1024)

...
tmp = tempfile.NamedTemporaryFile()

with open(tmp.name, 'wb') as f:
    object.download_fileobj(f)
    f.seek(0,2) 
    img=mpimg.imread(tmp.name)
    print (img.shape)