Python 如何将 S3 存储桶中的图像文件直接读取到内存中?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44043036/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to read image file from S3 bucket directly into memory?
提问by Dims
I have the following code
我有以下代码
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import boto3
s3 = boto3.resource('s3', region_name='us-east-2')
bucket = s3.Bucket('sentinel-s2-l1c')
object = bucket.Object('tiles/10/S/DG/2015/12/7/0/B01.jp2')
object.download_file('B01.jp2')
img=mpimg.imread('B01.jp2')
imgplot = plt.imshow(img)
plt.show(imgplot)
and it works. But the problem it downloads file into current directory first. Is it possible to read file and decode it as image directly in RAM?
它有效。但是它首先将文件下载到当前目录的问题。是否可以直接在 RAM 中读取文件并将其解码为图像?
回答by Greg Merritt
I would suggest using io moduleto read the file directly in to memory, without having to use a temporary file at all.
我建议使用io 模块将文件直接读入内存,而根本不必使用临时文件。
For example:
例如:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import boto3
import io
s3 = boto3.resource('s3', region_name='us-east-2')
bucket = s3.Bucket('sentinel-s2-l1c')
object = bucket.Object('tiles/10/S/DG/2015/12/7/0/B01.jp2')
file_stream = io.StringIO()
object.download_fileobj(file_stream)
img = mpimg.imread(file_stream)
# whatever you need to do
You could also use io.BytesIO
if your data is binary.
io.BytesIO
如果您的数据是二进制的,您也可以使用。
回答by Hyeungshik Jung
Greg Merritt's answer below is better method.
下面 Greg Merritt 的回答是更好的方法。
I'd like to suggest using Python's NamedTemporaryFilein tempfile
module. It creates temporary files that will be deleted as file is closed (Thanks to @NoamG)
我想建议在模块中使用 Python 的NamedTemporaryFiletempfile
。它创建将在文件关闭时删除的临时文件(感谢@NoamG)
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import boto3
import tempfile
s3 = boto3.resource('s3', region_name='us-east-2')
bucket = s3.Bucket('sentinel-s2-l1c')
object = bucket.Object('tiles/10/S/DG/2015/12/7/0/B01.jp2')
tmp = tempfile.NamedTemporaryFile()
with open(tmp.name, 'wb') as f:
object.download_fileobj(f)
img=mpimg.imread(tmp.name)
# ...Do jobs using img
回答by Adrian Tofting
Streaming the image is possible by specifying the file format in imread()
.
通过在imread()
.
import boto3
from io import BytesIO
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
resource = boto3.resource('s3', region_name='us-east-2')
bucket = resource.Bucket('sentinel-s2-l1c')
image_object = bucket.Object('tiles/10/S/DG/2015/12/7/0/B01.jp2')
image = mpimg.imread(BytesIO(image_object.get()['Body'].read()), 'jp2')
plt.figure(0)
plt.imshow(image)
回答by beahacker
Further development from Greg Merritt's answer to solve all errors in the comment section, using BytesIO
instead of StringIO
, using PIL Image
instead of matplotlib.image
.
从 Greg Merritt 的回答中进一步发展以解决评论部分中的所有错误,使用BytesIO
代替StringIO
,使用 PILImage
代替matplotlib.image
。
The following function works for python3
and boto3
. Similarly, write_image_to_s3
function is a bonus.
以下函数适用于python3
和boto3
。同样,write_image_to_s3
功能也是一种奖励。
from PIL import Image
from io import BytesIO
import numpy as np
def read_image_from_s3(bucket, key, region_name='ap-southeast-1'):
"""Load image file from s3.
Parameters
----------
bucket: string
Bucket name
key : string
Path in s3
Returns
-------
np array
Image array
"""
s3 = boto3.resource('s3', region_name='ap-southeast-1')
bucket = s3.Bucket(bucket)
object = bucket.Object(key)
response = object.get()
file_stream = response['Body']
im = Image.open(file_stream)
return np.array(im)
def write_image_to_s3(img_array, bucket, key, region_name='ap-southeast-1'):
"""Write an image array into S3 bucket
Parameters
----------
bucket: string
Bucket name
key : string
Path in s3
Returns
-------
None
"""
s3 = boto3.resource('s3', region_name)
bucket = s3.Bucket(bucket)
object = bucket.Object(key)
file_stream = BytesIO()
im = Image.fromarray(img_array)
im.save(file_stream, format='jpeg')
object.put(Body=file_stream.getvalue())
回答by Evgeniy
object = bucket.Object('tiles/10/S/DG/2015/12/7/0/B01.jp2')
img_data = object.get().get('Body').read()
回答by Kai
The temporary file solution by Hyeungshik Jung looks good, but I noticed that the file somehow seem to be downloaded in a lazy fashion. This leads to a behavior that if you call img.shape()
and you'll get an empty dimension tuple as a return value ()
even after you called object.download_fileobj(f)
. I resolved this issue by applying a f.seek(0,2)
to the file descriptor - then all following operations work properly, e.g. returning all proper dimensions (704, 1024)
.
Hyeungshik Jung 的临时文件解决方案看起来不错,但我注意到该文件似乎以一种懒惰的方式下载。这会导致一种行为,如果您调用img.shape()
并且()
即使在您调用object.download_fileobj(f)
. 我通过将 af.seek(0,2)
应用于文件描述符解决了这个问题- 然后所有以下操作正常工作,例如返回所有正确的维度(704, 1024)
。
...
tmp = tempfile.NamedTemporaryFile()
with open(tmp.name, 'wb') as f:
object.download_fileobj(f)
f.seek(0,2)
img=mpimg.imread(tmp.name)
print (img.shape)