Node.js 和 Amazon S3：如何遍历存储桶中的所有文件？

Question

提问by nab

Is there any Amazon S3 client library for Node.js that allows listing of all files in S3 bucket?

是否有任何适用于 Node.js 的 Amazon S3 客户端库允许列出 S3 存储桶中的所有文件？

The most known aws2jsand knoxdon't seem to have this functionality.

最著名的aws2js和knox似乎没有这个功能。

Answer 1

采纳答案by nab

In fact aws2jssupports listing of objects in a bucket on a low level via s3.get()method call. To do it one has to pass prefixparameter which is documented on Amazon S3 REST API page:

事实上，aws2js支持通过s3.get()方法调用在底层列出存储桶中的对象。为此，必须传递Amazon S3 REST API 页面prefix上记录的参数：

var s3 = require('aws2js').load('s3', awsAccessKeyId, awsSecretAccessKey);    
s3.setBucket(bucketName);

var folder = encodeURI('some/path/to/S3/folder');
var url = '?prefix=' + folder;

s3.get(url, 'xml', function (error, data) {
    console.log(error);
    console.log(data);
});

The datavariable in the above snippet contains a list of all objects in the bucketNamebucket.

data上面代码片段中的变量包含bucketName存储桶中所有对象的列表。

Answer 2

回答by Meekohi

Using the official aws-sdk:

使用官方aws-sdk：

var allKeys = [];
function listAllKeys(marker, cb)
{
  s3.listObjects({Bucket: s3bucket, Marker: marker}, function(err, data){
    allKeys.push(data.Contents);

    if(data.IsTruncated)
      listAllKeys(data.NextMarker, cb);
    else
      cb();
  });
}

see s3.listObjects

见s3.listObjects

Edit 2017: Same basic idea, but listObjectsV2( ... )is now recommended and uses a ContinuationToken(see s3.listObjectsV2):

编辑 2017 年：相同的基本思想，但listObjectsV2( ... )现在推荐并使用ContinuationToken（参见s3.listObjectsV2）：

var allKeys = [];
function listAllKeys(token, cb)
{
  var opts = { Bucket: s3bucket };
  if(token) opts.ContinuationToken = token;

  s3.listObjectsV2(opts, function(err, data){
    allKeys = allKeys.concat(data.Contents);

    if(data.IsTruncated)
      listAllKeys(data.NextContinuationToken, cb);
    else
      cb();
  });
}

Answer 3

回答by Ken Lin

Here's Node code I wrote to assemble the S3 objects from truncated lists.

这是我编写的用于从截断列表中组装 S3 对象的 Node 代码。

var params = {
    Bucket: <yourbucket>,
    Prefix: <yourprefix>,
};

var s3DataContents = [];    // Single array of all combined S3 data.Contents

function s3Print() {
    if (program.al) {
        // --al: Print all objects
        console.log(JSON.stringify(s3DataContents, null, "    "));
    } else {
        // --b: Print key only, otherwise also print index 
        var i;
        for (i = 0; i < s3DataContents.length; i++) {
            var head = !program.b ? (i+1) + ': ' : '';
            console.log(head + s3DataContents[i].Key);
        }
    }
}

function s3ListObjects(params, cb) {
    s3.listObjects(params, function(err, data) {
        if (err) {
            console.log("listS3Objects Error:", err);
        } else {
            var contents = data.Contents;
            s3DataContents = s3DataContents.concat(contents);
            if (data.IsTruncated) {
                // Set Marker to last returned key
                params.Marker = contents[contents.length-1].Key;
                s3ListObjects(params, cb);
            } else {
                cb();
            }
        }
    });
}

s3ListObjects(params, s3Print);

Pay attention to listObject'sdocumentation of NextMarker, which is NOTalways present in the returned data object, so I don't use it at all in the above code ...

注意listObject 的 NextMarker文档，它并不总是出现在返回的数据对象中，所以我在上面的代码中根本没有使用它......

NextMarker — (String) When response is truncated (the IsTruncatedelement value in the response is true), you can use the key name in this field as marker in the subsequent request to get next set of objects. Amazon S3 lists objects in alphabetical order Note: This element is returned only if you have delimiter request parameter specified. If response does not include the NextMarker and it is truncated, you can use the value of the last Key in the response as the marker in the subsequent request to get the next set of object keys.

NextMarker — (String) 当响应被截断时（响应中的IsTruncated元素值为 true），您可以使用该字段中的键名作为后续请求中的标记，以获取下一组对象。Amazon S3 按字母顺序列出对象注意：仅当您指定了分隔符请求参数时，才会返回此元素。如果响应中没有包含NextMarker并且被截断了，可以使用响应中最后一个Key的值作为后续请求中的标记来获取下一组对象key。

The entire program has now been pushed to https://github.com/kenklin/s3list.

整个程序现已推送到https://github.com/kenklin/s3list。

Answer 4

回答by hurrymaplelad

Published knox-copywhen I couldn't find a good existing solution. Wraps all the pagination details of the Rest API into a familiar node stream:

当我找不到好的现有解决方案时发布了knox-copy。将 Rest API 的所有分页细节包装成一个熟悉的节点流：

var knoxCopy = require('knox-copy');

var client = knoxCopy.createClient({
  key: '<api-key-here>',
  secret: '<secret-here>',
  bucket: 'mrbucket'
});

client.streamKeys({
  // omit the prefix to list the whole bucket
  prefix: 'buckets/of/fun' 
}).on('data', function(key) {
  console.log(key);
});

If you're listing fewer than 1000 files a single page will work:

如果您列出的文件少于 1000 个，则可以使用单个页面：

client.listPageOfKeys({
  prefix: 'smaller/bucket/o/fun'
}, function(err, page) {
  console.log(page.Contents); // <- Here's your list of files
});

Answer 5

回答by Thijs Lowette

Meekohi provided a very good answer, but the (new) documentation states that NextMarker can be undefined. When this is the case, you should use the last key as the marker.

Meekohi 提供了一个很好的答案，但（新）文档指出 NextMarker 可以是未定义的。在这种情况下，您应该使用最后一个键作为标记。

So his codesample can be changed into:

所以他的codesample可以改成：

var allKeys = [];
function listAllKeys(marker, cb) {
  s3.listObjects({Bucket: s3bucket, Marker: marker}, function(err, data){
    allKeys.push(data.Contents);
    if(data.IsTruncated)
      listAllKeys(data.NextMarker || data.Contents[data.Contents.length-1].Key, cb);
    else
      cb();
  });
}

Couldn't comment on the original answer since I don't have the required reputation. Apologies for the bad mark-up btw.

由于我没有所需的声誉，因此无法对原始答案发表评论。顺便说一句，为不好的标记道歉。

Answer 6

回答by logidelic

This is an old question and I guess the AWS JS SDK has changed a lot since it was asked. Here's yet another way to do it these days:

这是一个老问题，我猜 AWS JS SDK 自从被问到之后已经发生了很大的变化。这些天，这是另一种方法：

s3.listObjects({Bucket:'mybucket', Prefix:'some-pfx'}).
on('success', function handlePage(r) {
    //... handle page of contents r.data.Contents

    if(r.hasNextPage()) {
        // There's another page; handle it
        r.nextPage().on('success', handlePage).send();
    } else {
        // Finished!
    }
}).
on('error', function(r) {
    // Error!
}).
send();

Answer 7

回答by nkitku

using Async Generator

使用异步生成器

const { S3 } = require('aws-sdk');
const s3 = new S3();

async function* listAllKeys(opts) {
    opts = {...opts};
    do {
        const data = await s3.listObjectsV2(opts).promise();
        opts.ContinuationToken = data.NextContinuationToken;
        yield data;
    } while (opts.ContinuationToken)
}

const opts = {
    Bucket: 'bucket-xyz',
    /* required */
    // ContinuationToken: 'STRING_VALUE',
    // Delimiter: 'STRING_VALUE',
    // EncodingType: url,
    // FetchOwner: true || false,
    // MaxKeys: 'NUMBER_VALUE',
    // Prefix: 'STRING_VALUE',
    // RequestPayer: requester,
    // StartAfter: 'STRING_VALUE'
};

async function main() {
    // using for of await loop
    for await (const data of listAllKeys(opts)) {
        console.log(data.Contents)
    }

    // or lazy-load
    const keys = listAllKeys(opts);
    console.log(await keys.next());
    // {value: {…}, done: false}
    console.log(await keys.next());
    // {value: {…}, done: false}
    console.log(await keys.next());
    // {value: undefined, done: true}
}
main();

// Making Observable

const lister = opts => o => {
    let needMore = true;
    (async () => {
        const keys = listAllKeys(opts);
        for await (const data of keys) {
            o.next(data);
            if (!needMore) break;
        }
        o.complete();
    })();
    return () => (needMore = false);
}

// Using Rxjs

const { Observable } = require('rxjs');
const { flatMap } = require('rxjs/operators')

function listAll() {
    return Observable.create(lister(opts))
        .pipe(flatMap(v => v.Contents))
        .subscribe(console.log);
}

listAll();


// Using Nodejs EventEmitter

const EventEmitter = require('events');

const _eve = new EventEmitter();
_eve.on('next', console.log);

const stop = lister(opts)({
    next: v => _eve.emit('next', v),
    error: e => _eve.emit('error', e),
    complete: v => _eve.emit('complete', v)
});

Answer 8

回答by Carlos Rodriguez

I ended up building a wrapper function around ListObjectsV2, works the same way and takes the same parameters but works recursively until IsTruncated=false and returns all the keys found as an array in the second parameter of the callback function

我最终围绕 ListObjectsV2 构建了一个包装函数，以相同的方式工作并采用相同的参数，但递归工作直到 IsTruncated=false 并返回在回调函数的第二个参数中作为数组找到的所有键

const AWS = require('aws-sdk')
const s3 = new AWS.S3()

function listAllKeys(params, cb)
{
   var keys = []
   if(params.data){
      keys = keys.concat(params.data)
   }
   delete params['data']

   s3.listObjectsV2(params, function(err, data){
     if(err){
       cb(err)
     } else if (data.IsTruncated) {
       params['ContinuationToken'] = data.NextContinuationToken
       params['data'] = data.Contents
       listAllKeys(params, cb)
     } else {
       keys = keys.concat(data.Contents)
       cb(null,keys)
     }
   })
}

Answer 9

回答by John Tng

I am using this version with async/await.
This function will return the content in an array.
I'm also using the NextContinuationTokeninstead of the Marker.

我正在将此版本与async/await.
此函数将返回数组中的内容。
我也在使用NextContinuationToken而不是标记。

async function getFilesRecursivelySub(param) {

    // Call the function to get list of items from S3.
    let result = await s3.listObjectsV2(param).promise();

    if(!result.IsTruncated) {
        // Recursive terminating condition.
        return result.Contents;
    } else {
        // Recurse it if results are truncated.
        param.ContinuationToken = result.NextContinuationToken;
        return result.Contents.concat(await getFilesRecursivelySub(param));
    }
}

async function getFilesRecursively() {

    let param = {
        Bucket: 'YOUR_BUCKET_NAME'
        // Can add more parameters here.
    };

    return await getFilesRecursivelySub(param);
}

Answer 10

回答by Prasanth Jaya

If you want to get list of keys only within specific folder inside a S3 Bucket then this will be useful.

如果您只想获取 S3 Bucket 内特定文件夹中的密钥列表，那么这将很有用。

Basically, listObjectsfunction will start searching from the Markerwe set and it will search until maxKeys: 1000as limit. so it will search one by one folder and get you first 1000 keys it find from different folder in a bucket.

基本上，listObjects函数将从Marker我们设置的开始搜索，直到maxKeys: 1000限制为止。所以它会一个一个文件夹搜索，并为您提供它从存储桶中的不同文件夹中找到的前 1000 个密钥。

Consider i have many folders inside my bucket with prefix as prod/some date/, Ex: prod/2017/05/12/ ,prod/2017/05/13/,etc.

考虑到我的存储桶中有许多文件夹，前缀为prod/some date/, Ex: prod/2017/05/12/ ,prod/2017/05/13/,etc.

I want to fetch list of objects (file names) only within prod/2017/05/12/folder then i will specify prod/2017/05/12/as my start and prod/2017/05/13/[your next folder name] as my end and in code i'm breaking the loop when i encounter the end.

我只想在prod/2017/05/12/文件夹中获取对象列表（文件名）然后我将指定prod/2017/05/12/为我的开始和prod/2017/05/13/[你的下一个文件夹名称]作为我的结束和在代码中我在遇到结束时打破循环。

Each Keyin data.Contentswill look like this.

每个Keyindata.Contents看起来都像这样。

{      Key: 'prod/2017/05/13/4bf2c675-a417-4c1f-a0b4-22fc45f99207.jpg',
       LastModified: 2017-05-13T00:59:02.000Z,
       ETag: '"630b2sdfsdfs49ef392bcc16c833004f94ae850"',
       Size: 134236366,
       StorageClass: 'STANDARD',
       Owner: { } 
 }

Code:

代码：

var list = [];

function listAllKeys(s3bucket, start, end) {
  s3.listObjects({
    Bucket: s3bucket,
    Marker: start,
    MaxKeys: 1000,
  }, function(err, data) {
      if (data.Contents) {
        for (var i = 0; i < data.Contents.length; i++) {
         var key = data.Contents[i].Key;    //See above code for the structure of data.Contents
          if (key.substring(0, 19) != end) {
             list.push(key);
          } else {
             break;   // break the loop if end arrived
          }
       }
        console.log(list);
        console.log('Total - ', list.length);      
     }
   });
 }

listAllKeys('BucketName', 'prod/2017/05/12/', 'prod/2017/05/13/');

Output:

输出：

[ 'prod/2017/05/12/05/4bf2c675-a417-4c1f-a0b4-22fc45f99207.jpg',
  'prod/2017/05/12/05/a36528b9-e071-4b83-a7e6-9b32d6bce6d8.jpg',
  'prod/2017/05/12/05/bc4d6d4b-4455-48b3-a548-7a714c489060.jpg',
  'prod/2017/05/12/05/f4b8d599-80d0-46fa-a996-e73b8fd0cd6d.jpg',
  ... 689 more items ]
Total - 692

Node.js 和 Amazon S3：如何遍历存储桶中的所有文件？

提问by nab

采纳答案by nab

回答by Meekohi

回答by Ken Lin

回答by hurrymaplelad

回答by Thijs Lowette

回答by logidelic

回答by nkitku

回答by Carlos Rodriguez

回答by John Tng

回答by Prasanth Jaya

相关推荐

最近更新

标签

Node.js 和 Amazon S3：如何遍历存储桶中的所有文件？

提问by nab

采纳答案by nab

回答by Meekohi

回答by Ken Lin

回答by hurrymaplelad

回答by Thijs Lowette

回答by logidelic

回答by nkitku

回答by Carlos Rodriguez

回答by John Tng

回答by Prasanth Jaya

相关推荐

node.js CORS 飞行前返回 Access-Control-Allow-Origin:*，浏览器仍然无法请求

我是否需要在 NodeJS 中进行依赖注入，或者如何处理...？

node.js 类型错误：请求路径包含未转义的字符

node.js 在 socket.io 中使用 RedisStore 的例子

相关推荐

最近更新

标签