Python 将 S3 数据加载到 AWS SageMaker Notebook
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/48264656/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Load S3 Data into AWS SageMaker Notebook
提问by A555h55
I've just started to experiment with AWS SageMaker and would like to load data from an S3 bucket into a pandas dataframe in my SageMaker python jupyter notebook for analysis.
我刚刚开始尝试使用 AWS SageMaker,并希望将 S3 存储桶中的数据加载到我的 SageMaker python jupyter 笔记本中的 Pandas 数据帧中以进行分析。
I could use boto to grab the data from S3, but I'm wondering whether there is a more elegant method as part of the SageMaker framework to do this in my python code?
我可以使用 boto 从 S3 获取数据,但我想知道是否有更优雅的方法作为 SageMaker 框架的一部分在我的 python 代码中执行此操作?
Thanks in advance for any advice.
提前感谢您的任何建议。
采纳答案by Jonatan
回答by Chhoser
import boto3
import pandas as pd
from sagemaker import get_execution_role
role = get_execution_role()
bucket='my-bucket'
data_key = 'train.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)
pd.read_csv(data_location)
回答by ivankeller
In the simplest case you don't need boto3
, because you just readresources.
Then it's even simpler:
在最简单的情况下,您不需要boto3
,因为您只是阅读资源。
然后就更简单了:
import pandas as pd
bucket='my-bucket'
data_key = 'train.csv'
data_location = 's3://{}/{}'.format(bucket, data_key)
pd.read_csv(data_location)
But as Prateek stated make sure to configure your SageMaker notebook instance. to have access to s3. This is done at configuration step in Permissions > IAM role
但正如 Prateek 所说,请确保配置您的 SageMaker 笔记本实例。可以访问s3。这是在权限 > IAM 角色中的配置步骤中完成的
回答by Ben
You could also access your bucket as your file system using s3fs
您还可以使用您的存储桶作为文件系统访问 s3fs
import s3fs
fs = s3fs.S3FileSystem()
# To List 5 files in your accessible bucket
fs.ls('s3://bucket-name/data/')[:5]
# open it directly
with fs.open(f's3://bucket-name/data/image.png') as f:
display(Image.open(f))
回答by Prateek Dubey
Do make sure the Amazon SageMaker role has policy attached to it to have access to S3. It can be done in IAM.
请确保 Amazon SageMaker 角色附加了策略以访问 S3。它可以在 IAM 中完成。
回答by ivankeller
You can also use AWS Data Wrangler https://github.com/awslabs/aws-data-wrangler:
您还可以使用 AWS Data Wrangler https://github.com/awslabs/aws-data-wrangler:
import awswrangler as wr
df = wr.pandas.read_csv(path="s3://...")