使用 Python boto3 从 S3 读取 JSON 文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40995251/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 00:19:27  来源:igfitidea点击:

Reading an JSON file from S3 using Python boto3

pythonjsonamazon-web-servicesamazon-s3boto3

提问by Nanju

I kept following JSON in S3 bucket 'test'

我一直在 S3 存储桶“测试”中关注 JSON

{
  'Details' : "Something" 
}

I am using following code to read this JSON and printing the key 'Details'

我正在使用以下代码读取此 JSON 并打印密钥“详细信息”

s3 = boto3.resource('s3',
                    aws_access_key_id=<access_key>,
                    aws_secret_access_key=<secret_key>
                    )
content_object = s3.Object('test', 'sample_json.txt')
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(repr(file_content))
print(json_content['Details'])

And i am getting error as 'string indices must be integers'I don't want to download the file from S3 and then reading..

我收到错误,因为“字符串索引必须是整数”我不想从 S3 下载文件然后阅读..

回答by bastelflp

As mentioned in the comments above, reprhas to be removed and the jsonfile has to use double quotesfor attributes. Using this file on aws/s3:

正如上面的评论中提到的,repr必须删除,并且json文件必须对属性使用双引号。在 aws/s3 上使用此文件:

{
  "Details" : "Something"
}

and the following Python code, it works:

以及以下 Python 代码,它可以工作:

import boto3
import json

s3 = boto3.resource('s3')

content_object = s3.Object('test', 'sample_json.txt')
file_content = content_object.get()['Body'].read().decode('utf-8')
json_content = json.loads(file_content)
print(json_content['Details'])
# >> Something

回答by Hafizur Rahman

The following worked for me.

以下对我有用。

# read_s3.py
import boto3
BUCKET = 'MY_S3_BUCKET_NAME'
FILE_TO_READ = 'FOLDER_PATH/my_file.json'
client = boto3.client('s3',
                       aws_access_key_id='MY_AWS_KEY_ID',
                       aws_secret_access_key='MY_AWS_SECRET_ACCESS_KEY'
                     )
result = client.get_object(Bucket=BUCKET, Key=FILE_TO_READ) 
text = result["Body"].read().decode()
print(text['Details']) # Use your desired JSON Key for your value 

It is not good idea to hard code the AWS Id & Secret Keys directly. For best practices, you can consider either of the followings:

直接对 AWS Id 和密钥进行硬编码并不是一个好主意。对于最佳实践,您可以考虑以下任一方式:

(1) Read your AWS credentials from a json file stored in your local storage:

(1) 从存储在本地存储中的 json 文件中读取您的 AWS 凭证:

import json
credentials = json.load(open('aws_cred.json'))
client = boto3.client('s3',
                       aws_access_key_id=credentials['MY_AWS_KEY_ID'],
                       aws_secret_access_key=credentials['MY_AWS_SECRET_ACCESS_KEY']
                     )

(2) Read from your environment variable (my preferred option for deployment):

(2) 从您的环境变量中读取(我首选的部署选项):

import os
client = boto3.client('s3',
                       aws_access_key_id=os.environ['MY_AWS_KEY_ID'],
                       aws_secret_access_key=os.environ['MY_AWS_SECRET_ACCESS_KEY']
                     )

Let's prepare a shell script (set_env.sh) for setting the environment variables and add our python script (read_s3.py) as follows:

让我们准备一个set_env.sh用于设置环境变量的 shell 脚本 ( ) 并添加我们的 python 脚本 ( read_s3.py) 如下:

# set_env.sh
export MY_AWS_KEY_ID='YOUR_AWS_ACCESS_KEY_ID'
export MY_AWS_SECRET_ACCESS_KEY='YOUR_AWS_SECRET_ACCESS_KEY'
# execute the python file containing your code as stated above that reads from s3
python read_s3.py # will execute the python script to read from s3

Now execute the shell script in a terminal as follows:

现在在终端中执行 shell 脚本,如下所示:

sh set_env.sh

回答by alukach

Wanted to add that the botocore.response.streamingbodyworks well with json.load:

想补充一点,它botocore.response.streamingbody适用于json.load

import json
import boto3

s3 = boto3.resource('s3')

obj = s3.Object(bucket, key)
data = json.load(obj.get()['Body']) 

回答by Cerberussian

I was stuck for a bit as the decoding didn't work for me (s3 objects are gzipped).

由于解码对我不起作用(s3对象被压缩),我被卡住了一点。

Found this discussion which helped me: Python gzip: is there a way to decompress from a string?

发现这个讨论对我有帮助: Python gzip: is there a way to decompress from a string?

import boto3
import zlib

key = event["Records"][0]["s3"]["object"]["key"]
bucket_name = event["Records"][0]["s3"]["bucket"]["name"]

s3_object = S3_RESOURCE.Object(bucket_name, key).get()['Body'].read()

jsonData = zlib.decompress(s3_object, 16+zlib.MAX_WBITS)

If youprint jsonData, you'll see your desired JSON file! If you are running test in AWS itself, be sure to check CloudWatch logs as in lambda it wont output full JSON file if its too long.

如果你打印 jsonData,你会看到你想要的 JSON 文件!如果您在 AWS 本身中运行测试,请务必检查 CloudWatch 日志,因为在 lambda 中它不会输出完整的 JSON 文件,如果它太长。