node.js 使用节点 fs 从 aws s3 存储桶读取文件

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27299139/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-02 18:11:42  来源:igfitidea点击:

Read file from aws s3 bucket using node fs

node.jsamazon-web-servicesamazon-s3fs

提问by Joel

I am attempting to read a file that is in a aws s3 bucket using

我正在尝试使用以下命令读取 aws s3 存储桶中的文件

fs.readFile(file, function (err, contents) {
  var myLines = contents.Body.toString().split('\n')
})

I've been able to download and upload a file using the node aws-sdk, but I am at a loss as to how to simply read it and parse the contents.

我已经能够使用节点 aws-sdk 下载和上传文件,但我不知道如何简单地读取它并解析内容。

Here is an example of how I am reading the file from s3:

这是我如何从 s3 读取文件的示例:

var s3 = new AWS.S3();
var params = {Bucket: 'myBucket', Key: 'myKey.csv'}
var s3file = s3.getObject(params)

回答by dug

You have a couple options. You can include a callback as a second argument, which will be invoked with any error message and the object. This exampleis straight from the AWS documentation:

你有几个选择。您可以包含一个回调作为第二个参数,它将使用任何错误消息和对象进行调用。此示例直接来自 AWS 文档:

s3.getObject(params, function(err, data) {
  if (err) console.log(err, err.stack); // an error occurred
  else     console.log(data);           // successful response
});

Alternatively, you can convert the output to a stream. There's also an examplein the AWS documentation:

或者,您可以将输出转换为流。AWS 文档中还有一个示例

var s3 = new AWS.S3({apiVersion: '2006-03-01'});
var params = {Bucket: 'myBucket', Key: 'myImageFile.jpg'};
var file = require('fs').createWriteStream('/path/to/file.jpg');
s3.getObject(params).createReadStream().pipe(file);

回答by Lai Xue

This will do it:

这将做到:

new AWS.S3().getObject({ Bucket: this.awsBucketName, Key: keyName }, function(err, data)
{
    if (!err)
        console.log(data.Body.toString());
});

回答by Jason

Since you seem to want to process an S3 text file line-by-line. Here is a Node version that uses the standard readline module and AWS' createReadStream()

由于您似乎想逐行处理 S3 文本文件。这是一个使用标准 readline 模块和 AWS 的 createReadStream() 的 Node 版本

const readline = require('readline');

const rl = readline.createInterface({
    input: s3.getObject(params).createReadStream()
});

rl.on('line', function(line) {
    console.log(line);
})
.on('close', function() {
});

回答by Gustavo Straube

I couldn't figure why yet, but the createReadStream/pipeapproach didn't work for me. I was trying to download a large CSV file (300MB+) and I got duplicated lines. It seemed a random issue. The final file size varied in each attempt to download it.

我不知道为什么,但createReadStream/pipe方法对我不起作用。我试图下载一个大的 CSV 文件 (300MB+),但我得到了重复的行。这似乎是一个随机问题。每次尝试下载时,最终文件大小都不同。

I ended up using another way, based on AWS JS SDK examples:

我最终使用了另一种方式,基于AWS JS SDK 示例

var s3 = new AWS.S3();
var params = {Bucket: 'myBucket', Key: 'myImageFile.jpg'};
var file = require('fs').createWriteStream('/path/to/file.jpg');

s3.getObject(params).
    on('httpData', function(chunk) { file.write(chunk); }).
    on('httpDone', function() { file.end(); }).
    send();

This way, it worked like a charm.

这样,它就像一个魅力。

回答by devendra

here is the example which i used to retrive and parse json data from s3.

这是我用来从 s3 检索和解析 json 数据的示例。

    var params = {Bucket: BUCKET_NAME, Key: KEY_NAME};
    new AWS.S3().getObject(params, function(err, json_data)
    {
      if (!err) {
        var json = JSON.parse(new Buffer(json_data.Body).toString("utf8"));

       // PROCESS JSON DATA
           ......
     }
   });

回答by loretoparisi

I had exactly the same issue when downloading from S3 very large files.

从 S3 非常大的文件下载时,我遇到了完全相同的问题。

The example solution from AWS docs just does not work:

AWS 文档中的示例解决方案不起作用:

var file = fs.createWriteStream(options.filePath);
        file.on('close', function(){
            if(self.logger) self.logger.info("S3Dataset file download saved to %s", options.filePath );
            return callback(null,done);
        });
        s3.getObject({ Key:  documentKey }).createReadStream().on('error', function(err) {
            if(self.logger) self.logger.error("S3Dataset download error key:%s error:%@", options.fileName, error);
            return callback(error);
        }).pipe(file);

While this solution will work:

虽然此解决方案有效:

    var file = fs.createWriteStream(options.filePath);
    s3.getObject({ Bucket: this._options.s3.Bucket, Key: documentKey })
    .on('error', function(err) {
        if(self.logger) self.logger.error("S3Dataset download error key:%s error:%@", options.fileName, error);
        return callback(error);
    })
    .on('httpData', function(chunk) { file.write(chunk); })
    .on('httpDone', function() { 
        file.end(); 
        if(self.logger) self.logger.info("S3Dataset file download saved to %s", options.filePath );
        return callback(null,done);
    })
    .send();

The createReadStreamattempt just does not fire the end, closeor errorcallback for some reason. See hereabout this.

由于某种原因,该createReadStream尝试不会触发end,closeerror回调。请参阅此处

I'm using that solution also for writing down archives to gzip, since the first one (AWS example) does not work in this case either:

我还使用该解决方案将档案记录到 gzip,因为第一个(AWS 示例)在这种情况下也不起作用:

        var gunzip = zlib.createGunzip();
        var file = fs.createWriteStream( options.filePath );

        s3.getObject({ Bucket: this._options.s3.Bucket, Key: documentKey })
        .on('error', function (error) {
            if(self.logger) self.logger.error("%@",error);
            return callback(error);
        })
        .on('httpData', function (chunk) {
            file.write(chunk);
        })
        .on('httpDone', function () {

            file.end();

            if(self.logger) self.logger.info("downloadArchive downloaded %s", options.filePath);

            fs.createReadStream( options.filePath )
            .on('error', (error) => {
                return callback(error);
            })
            .on('end', () => {
                if(self.logger) self.logger.info("downloadArchive unarchived %s", options.fileDest);
                return callback(null, options.fileDest);
            })
            .pipe(gunzip)
            .pipe(fs.createWriteStream(options.fileDest))
        })
        .send();

回答by kgangadhar

If you want to save memory and want to obtain each row as a json object, then you can use fast-csvto create readstream and can read each row as a json object as follows:

如果你想节省内存,又想把每一行作为一个json对象获取,那么你可以使用fast-csv创建readstream并且可以将每一行作为一个json对象来读取,如下所示:

const csv = require('fast-csv');
const AWS = require('aws-sdk');

const credentials = new AWS.Credentials("ACCESSKEY", "SECRETEKEY", "SESSIONTOKEN");
AWS.config.update({
    credentials: credentials, // credentials required for local execution
    region: 'your_region'
});
const dynamoS3Bucket = new AWS.S3();
const stream = dynamoS3Bucket.getObject({ Bucket: 'your_bucket', Key: 'example.csv' }).createReadStream();

var parser = csv.fromStream(stream, { headers: true }).on("data", function (data) {
    parser.pause();  //can pause reading using this at a particular row
    parser.resume(); // to continue reading
    console.log(data);
}).on("end", function () {
    console.log('process finished');
});

回答by Costin

I prefer Buffer.from(data.Body).toString('utf8'). It supports encoding parameters. With other AWS services (ex. Kinesis Streams) someone may want to replace 'utf8'encoding with 'base64'.

我更喜欢Buffer.from(data.Body).toString('utf8')。它支持编码参数。对于其他 AWS 服务(例如 Kinesis Streams),有人可能希望将'utf8'编码替换为'base64'.

new AWS.S3().getObject(
  { Bucket: this.awsBucketName, Key: keyName }, 
  function(err, data) {
    if (!err) {
      const body = Buffer.from(data.Body).toString('utf8');
      console.log(body);
    }
  }
);

回答by ryandb

If you are looking to avoid the callbacks you can take advantage of the sdk .promise() function like this:

如果您想避免回调,您可以像这样利用 sdk .promise() 函数:

const s3 = new AWS.S3();
const params = {Bucket: 'myBucket', Key: 'myKey.csv'}
const response = await s3.getObject(params).promise() // await the promise
const fileContent = getObjectResult.Body.toString('utf-8'); // can also do 'base64' here if desired

I'm sure the other ways mentioned here have their advantages but this works great for me. Sourced from this thread (see the last response from AWS): https://forums.aws.amazon.com/thread.jspa?threadID=116788

我确信这里提到的其他方式都有其优点,但这对我来说非常有用。源自此线程(请参阅 AWS 的最后回复):https: //forums.aws.amazon.com/thread.jspa?threadID =116788

回答by Ripon Banik

With the new version of sdk, the accepted answer does not work - it does not wait for the object to be downloaded. The following code snippet will help with the new version:

使用新版本的 sdk,接受的答案不起作用 - 它不会等待对象被下载。以下代码片段将有助于新版本:

// dependencies

const AWS = require('aws-sdk');

// get reference to S3 client

const s3 = new AWS.S3();

exports.handler = async (event, context, callback) => {

var bucket = "TestBucket"

var key = "TestKey"

   try {

      const params = {
            Bucket: Bucket,
            Key: Key
        };

       var theObject = await s3.getObject(params).promise();

    } catch (error) {
        console.log(error);
        return;
    }  
}