Node.js 和 Amazon S3:如何遍历存储桶中的所有文件?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9437581/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Node.js & Amazon S3: How to iterate through all files in a bucket?
提问by nab
采纳答案by nab
In fact aws2jssupports listing of objects in a bucket on a low level via s3.get()method call. To do it one has to pass prefixparameter which is documented on Amazon S3 REST API page:
事实上,aws2js支持通过s3.get()方法调用在底层列出存储桶中的对象。为此,必须传递Amazon S3 REST API 页面prefix上记录的参数:
var s3 = require('aws2js').load('s3', awsAccessKeyId, awsSecretAccessKey);
s3.setBucket(bucketName);
var folder = encodeURI('some/path/to/S3/folder');
var url = '?prefix=' + folder;
s3.get(url, 'xml', function (error, data) {
console.log(error);
console.log(data);
});
The datavariable in the above snippet contains a list of all objects in the bucketNamebucket.
data上面代码片段中的变量包含bucketName存储桶中所有对象的列表。
回答by Meekohi
Using the official aws-sdk:
使用官方aws-sdk:
var allKeys = [];
function listAllKeys(marker, cb)
{
s3.listObjects({Bucket: s3bucket, Marker: marker}, function(err, data){
allKeys.push(data.Contents);
if(data.IsTruncated)
listAllKeys(data.NextMarker, cb);
else
cb();
});
}
see s3.listObjects
Edit 2017:
Same basic idea, but listObjectsV2( ... )is now recommended and uses a ContinuationToken(see s3.listObjectsV2):
编辑 2017 年:相同的基本思想,但listObjectsV2( ... )现在推荐并使用ContinuationToken(参见s3.listObjectsV2):
var allKeys = [];
function listAllKeys(token, cb)
{
var opts = { Bucket: s3bucket };
if(token) opts.ContinuationToken = token;
s3.listObjectsV2(opts, function(err, data){
allKeys = allKeys.concat(data.Contents);
if(data.IsTruncated)
listAllKeys(data.NextContinuationToken, cb);
else
cb();
});
}
回答by Ken Lin
Here's Node code I wrote to assemble the S3 objects from truncated lists.
这是我编写的用于从截断列表中组装 S3 对象的 Node 代码。
var params = {
Bucket: <yourbucket>,
Prefix: <yourprefix>,
};
var s3DataContents = []; // Single array of all combined S3 data.Contents
function s3Print() {
if (program.al) {
// --al: Print all objects
console.log(JSON.stringify(s3DataContents, null, " "));
} else {
// --b: Print key only, otherwise also print index
var i;
for (i = 0; i < s3DataContents.length; i++) {
var head = !program.b ? (i+1) + ': ' : '';
console.log(head + s3DataContents[i].Key);
}
}
}
function s3ListObjects(params, cb) {
s3.listObjects(params, function(err, data) {
if (err) {
console.log("listS3Objects Error:", err);
} else {
var contents = data.Contents;
s3DataContents = s3DataContents.concat(contents);
if (data.IsTruncated) {
// Set Marker to last returned key
params.Marker = contents[contents.length-1].Key;
s3ListObjects(params, cb);
} else {
cb();
}
}
});
}
s3ListObjects(params, s3Print);
Pay attention to listObject'sdocumentation of NextMarker, which is NOTalways present in the returned data object, so I don't use it at all in the above code ...
注意listObject 的 NextMarker文档,它并不总是出现在返回的数据对象中,所以我在上面的代码中根本没有使用它......
NextMarker — (String) When response is truncated (the IsTruncatedelement value in the response is true), you can use the key name in this field as marker in the subsequent request to get next set of objects. Amazon S3 lists objects in alphabetical order Note: This element is returned only if you have delimiter request parameter specified. If response does not include the NextMarker and it is truncated, you can use the value of the last Key in the response as the marker in the subsequent request to get the next set of object keys.
NextMarker — (String) 当响应被截断时(响应中的IsTruncated元素值为 true),您可以使用该字段中的键名作为后续请求中的标记,以获取下一组对象。Amazon S3 按字母顺序列出对象 注意:仅当您指定了分隔符请求参数时,才会返回此元素。如果响应中没有包含NextMarker并且被截断了,可以使用响应中最后一个Key的值作为后续请求中的标记来获取下一组对象key。
The entire program has now been pushed to https://github.com/kenklin/s3list.
整个程序现已推送到https://github.com/kenklin/s3list。
回答by hurrymaplelad
Published knox-copywhen I couldn't find a good existing solution. Wraps all the pagination details of the Rest API into a familiar node stream:
当我找不到好的现有解决方案时发布了knox-copy。将 Rest API 的所有分页细节包装成一个熟悉的节点流:
var knoxCopy = require('knox-copy');
var client = knoxCopy.createClient({
key: '<api-key-here>',
secret: '<secret-here>',
bucket: 'mrbucket'
});
client.streamKeys({
// omit the prefix to list the whole bucket
prefix: 'buckets/of/fun'
}).on('data', function(key) {
console.log(key);
});
If you're listing fewer than 1000 files a single page will work:
如果您列出的文件少于 1000 个,则可以使用单个页面:
client.listPageOfKeys({
prefix: 'smaller/bucket/o/fun'
}, function(err, page) {
console.log(page.Contents); // <- Here's your list of files
});
回答by Thijs Lowette
Meekohi provided a very good answer, but the (new) documentation states that NextMarker can be undefined. When this is the case, you should use the last key as the marker.
Meekohi 提供了一个很好的答案,但(新)文档指出 NextMarker 可以是未定义的。在这种情况下,您应该使用最后一个键作为标记。
So his codesample can be changed into:
所以他的codesample可以改成:
var allKeys = [];
function listAllKeys(marker, cb) {
s3.listObjects({Bucket: s3bucket, Marker: marker}, function(err, data){
allKeys.push(data.Contents);
if(data.IsTruncated)
listAllKeys(data.NextMarker || data.Contents[data.Contents.length-1].Key, cb);
else
cb();
});
}
Couldn't comment on the original answer since I don't have the required reputation. Apologies for the bad mark-up btw.
由于我没有所需的声誉,因此无法对原始答案发表评论。顺便说一句,为不好的标记道歉。
回答by logidelic
This is an old question and I guess the AWS JS SDK has changed a lot since it was asked. Here's yet another way to do it these days:
这是一个老问题,我猜 AWS JS SDK 自从被问到之后已经发生了很大的变化。这些天,这是另一种方法:
s3.listObjects({Bucket:'mybucket', Prefix:'some-pfx'}).
on('success', function handlePage(r) {
//... handle page of contents r.data.Contents
if(r.hasNextPage()) {
// There's another page; handle it
r.nextPage().on('success', handlePage).send();
} else {
// Finished!
}
}).
on('error', function(r) {
// Error!
}).
send();
回答by nkitku
using Async Generator
使用异步生成器
const { S3 } = require('aws-sdk');
const s3 = new S3();
async function* listAllKeys(opts) {
opts = {...opts};
do {
const data = await s3.listObjectsV2(opts).promise();
opts.ContinuationToken = data.NextContinuationToken;
yield data;
} while (opts.ContinuationToken)
}
const opts = {
Bucket: 'bucket-xyz',
/* required */
// ContinuationToken: 'STRING_VALUE',
// Delimiter: 'STRING_VALUE',
// EncodingType: url,
// FetchOwner: true || false,
// MaxKeys: 'NUMBER_VALUE',
// Prefix: 'STRING_VALUE',
// RequestPayer: requester,
// StartAfter: 'STRING_VALUE'
};
async function main() {
// using for of await loop
for await (const data of listAllKeys(opts)) {
console.log(data.Contents)
}
// or lazy-load
const keys = listAllKeys(opts);
console.log(await keys.next());
// {value: {…}, done: false}
console.log(await keys.next());
// {value: {…}, done: false}
console.log(await keys.next());
// {value: undefined, done: true}
}
main();
// Making Observable
const lister = opts => o => {
let needMore = true;
(async () => {
const keys = listAllKeys(opts);
for await (const data of keys) {
o.next(data);
if (!needMore) break;
}
o.complete();
})();
return () => (needMore = false);
}
// Using Rxjs
const { Observable } = require('rxjs');
const { flatMap } = require('rxjs/operators')
function listAll() {
return Observable.create(lister(opts))
.pipe(flatMap(v => v.Contents))
.subscribe(console.log);
}
listAll();
// Using Nodejs EventEmitter
const EventEmitter = require('events');
const _eve = new EventEmitter();
_eve.on('next', console.log);
const stop = lister(opts)({
next: v => _eve.emit('next', v),
error: e => _eve.emit('error', e),
complete: v => _eve.emit('complete', v)
});
回答by Carlos Rodriguez
I ended up building a wrapper function around ListObjectsV2, works the same way and takes the same parameters but works recursively until IsTruncated=false and returns all the keys found as an array in the second parameter of the callback function
我最终围绕 ListObjectsV2 构建了一个包装函数,以相同的方式工作并采用相同的参数,但递归工作直到 IsTruncated=false 并返回在回调函数的第二个参数中作为数组找到的所有键
const AWS = require('aws-sdk')
const s3 = new AWS.S3()
function listAllKeys(params, cb)
{
var keys = []
if(params.data){
keys = keys.concat(params.data)
}
delete params['data']
s3.listObjectsV2(params, function(err, data){
if(err){
cb(err)
} else if (data.IsTruncated) {
params['ContinuationToken'] = data.NextContinuationToken
params['data'] = data.Contents
listAllKeys(params, cb)
} else {
keys = keys.concat(data.Contents)
cb(null,keys)
}
})
}
回答by John Tng
I am using this version with async/await.
This function will return the content in an array.
I'm also using the NextContinuationTokeninstead of the Marker.
我正在将此版本与async/await.
此函数将返回数组中的内容。
我也在使用NextContinuationToken而不是标记。
async function getFilesRecursivelySub(param) {
// Call the function to get list of items from S3.
let result = await s3.listObjectsV2(param).promise();
if(!result.IsTruncated) {
// Recursive terminating condition.
return result.Contents;
} else {
// Recurse it if results are truncated.
param.ContinuationToken = result.NextContinuationToken;
return result.Contents.concat(await getFilesRecursivelySub(param));
}
}
async function getFilesRecursively() {
let param = {
Bucket: 'YOUR_BUCKET_NAME'
// Can add more parameters here.
};
return await getFilesRecursivelySub(param);
}
回答by Prasanth Jaya
If you want to get list of keys only within specific folder inside a S3 Bucket then this will be useful.
如果您只想获取 S3 Bucket 内特定文件夹中的密钥列表,那么这将很有用。
Basically, listObjectsfunction will start searching from the Markerwe set and it will search until maxKeys: 1000as limit. so it will search one by one folder and get you first 1000 keys it find from different folder in a bucket.
基本上,listObjects函数将从Marker我们设置的开始搜索,直到maxKeys: 1000限制为止。所以它会一个一个文件夹搜索,并为您提供它从存储桶中的不同文件夹中找到的前 1000 个密钥。
Consider i have many folders inside my bucket with prefix as prod/some date/, Ex: prod/2017/05/12/ ,prod/2017/05/13/,etc.
考虑到我的存储桶中有许多文件夹,前缀为prod/some date/, Ex: prod/2017/05/12/ ,prod/2017/05/13/,etc.
I want to fetch list of objects (file names) only within prod/2017/05/12/folder then i will specify prod/2017/05/12/as my start and prod/2017/05/13/[your next folder name] as my end and in code i'm breaking the loop when i encounter the end.
我只想在prod/2017/05/12/文件夹中获取对象列表(文件名)然后我将指定prod/2017/05/12/为我的开始和prod/2017/05/13/[你的下一个文件夹名称]作为我的结束和在代码中我在遇到结束时打破循环。
Each Keyin data.Contentswill look like this.
每个Keyindata.Contents看起来都像这样。
{ Key: 'prod/2017/05/13/4bf2c675-a417-4c1f-a0b4-22fc45f99207.jpg',
LastModified: 2017-05-13T00:59:02.000Z,
ETag: '"630b2sdfsdfs49ef392bcc16c833004f94ae850"',
Size: 134236366,
StorageClass: 'STANDARD',
Owner: { }
}
Code:
代码:
var list = [];
function listAllKeys(s3bucket, start, end) {
s3.listObjects({
Bucket: s3bucket,
Marker: start,
MaxKeys: 1000,
}, function(err, data) {
if (data.Contents) {
for (var i = 0; i < data.Contents.length; i++) {
var key = data.Contents[i].Key; //See above code for the structure of data.Contents
if (key.substring(0, 19) != end) {
list.push(key);
} else {
break; // break the loop if end arrived
}
}
console.log(list);
console.log('Total - ', list.length);
}
});
}
listAllKeys('BucketName', 'prod/2017/05/12/', 'prod/2017/05/13/');
Output:
输出:
[ 'prod/2017/05/12/05/4bf2c675-a417-4c1f-a0b4-22fc45f99207.jpg',
'prod/2017/05/12/05/a36528b9-e071-4b83-a7e6-9b32d6bce6d8.jpg',
'prod/2017/05/12/05/bc4d6d4b-4455-48b3-a548-7a714c489060.jpg',
'prod/2017/05/12/05/f4b8d599-80d0-46fa-a996-e73b8fd0cd6d.jpg',
... 689 more items ]
Total - 692

