Amazon s3 只为一个存储桶返回 1000 个条目,而为另一个存储桶返回所有条目(使用 java sdk)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12853476/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 10:29:11  来源:igfitidea点击:

Amazon s3 returns only 1000 entries for one bucket and all for another bucket (using java sdk)?

javaamazon-s3

提问by Abhishek

I am using below mentioned code to get list of all file names from s3 bucket. I have two bucket in s3. For one of the bucket below code returns all the file names (more than 1000), but the same code returns only 1000 file names for another bucket. I just don't get what is happening. Why same code running for one bucket and not for other ?

我正在使用下面提到的代码从 s3 存储桶中获取所有文件名的列表。我在 s3 中有两个存储桶。对于下面的存储桶之一,代码返回所有文件名(超过 1000 个),但相同的代码仅返回另一个存储桶的 1000 个文件名。我只是不明白发生了什么。为什么相同的代码只为一个桶运行而不是为另一个桶运行?

Also my bucket have hierarchy structure folder/filename.jpg.

我的存储桶也有层次结构文件夹/文件名.jpg。

ObjectListing objects = s3.listObjects("bucket.new.test");
do {
    for (S3ObjectSummary objectSummary : objects.getObjectSummaries()) {
        String key = objectSummary.getKey();
        System.out.println(key);
    }
    objects = s3.listNextBatchOfObjects(objects);
} while (objects.isTruncated());

采纳答案by oferei

Improving on @Abhishek's answer. This code is slightly shorter and variable names are fixed.

改进@Abhishek 的回答。这段代码略短,变量名是固定的。

You have to get the object listing, add its' contents to the collection, then get the next batch of objects from the listing. Repeat the operation until the listing will not be truncated.

您必须获取对象列表,将其内容添加到集合中,然后从列表中获取下一批对象。重复该操作,直到列表不会被截断。

List<S3ObjectSummary> keyList = new ArrayList<S3ObjectSummary>();
ObjectListing objects = s3.listObjects("bucket.new.test");
keyList.addAll(objects.getObjectSummaries());

while (objects.isTruncated()) {
    objects = s3.listNextBatchOfObjects(objects);
    keyList.addAll(objects.getObjectSummaries());
}

回答by Paolo Angioletti

For Scala developers, here it is recursive function to execute a full scan and mapof the contents of an AmazonS3 bucket using the official AWS SDK for Java

对于 Scala 开发人员,这里是使用官方AWS SDK for Java执行 AmazonS3 存储桶内容的完整扫描和映射的递归函数

import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest}
import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}

def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {

  def scan(acc:List[T], listing:ObjectListing): List[T] = {
    val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())
    val mapped = (for (summary <- summaries) yield f(summary)).toList

    if (!listing.isTruncated) mapped.toList
    else scan(acc ::: mapped, s3.listNextBatchOfObjects(listing))
  }

  scan(List(), s3.listObjects(bucket, prefix))
}

To invoke the above curried map()function, simply pass the already constructed (and properly initialized) AmazonS3Client object (refer to the official AWS SDK for Java API Reference), the bucket name and the prefix name in the first parameter list. Also pass the function f()you want to apply to map each object summary in the second parameter list.

要调用上述柯里化map()函数,只需在第一个参数列表中传递已经构建(并正确初始化)的 AmazonS3Client 对象(请参阅官方AWS SDK for Java API 参考)、存储桶名称和前缀名称。还要传递f()要应用的函数来映射第二个参数列表中的每个对象摘要。

For example

例如

val keyOwnerTuples = map(s3, bucket, prefix)(s => (s.getKey, s.getOwner))

will return the full list of (key, owner)tuples in that bucket/prefix

将返回该(key, owner)桶/前缀中元组的完整列表

or

或者

map(s3, "bucket", "prefix")(s => println(s))

as you would normally approach by Monads in Functional Programming

就像你在函数式编程中通常通过Monads 所接近的那样

回答by Abhishek

I have just changed above code to use addAllinstead of using a forloop to add objects one by one and it worked for me:

我刚刚将上面的代码更改为使用addAll而不是使用for循环来一个一个地添加对象,它对我有用

List<S3ObjectSummary> keyList = new ArrayList<S3ObjectSummary>();
ObjectListing object = s3.listObjects("bucket.new.test");
keyList = object.getObjectSummaries();
object = s3.listNextBatchOfObjects(object);

while (object.isTruncated()){
  keyList.addAll(current.getObjectSummaries());
  object = s3.listNextBatchOfObjects(current);
}
keyList.addAll(object.getObjectSummaries());

After that you can simply use any iterator over list keyList.

之后,您可以简单地在列表keyList 上使用任何迭代器。

回答by Sy Loc

If you want to get all of object (more than 1000 keys) you need to send another packet with the last key to S3. Here is the code.

如果您想获取所有对象(超过 1000 个密钥),您需要将带有最后一个密钥的另一个数据包发送到 S3。这是代码。

private static String lastKey = "";
private static String preLastKey = "";
...

do{
        preLastKey = lastKey;
        AmazonS3 s3 = new AmazonS3Client(new ClasspathPropertiesFileCredentialsProvider());

        String bucketName = "bucketname";           

        ListObjectsRequest lstRQ = new ListObjectsRequest().withBucketName(bucketName).withPrefix("");  

        lstRQ.setMarker(lastKey);  

        ObjectListing objectListing = s3.listObjects(lstRQ);

        //  loop and get file on S3
        for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
             //   get oject and do something.....
        }
}while(lastKey != preLastKey);

回答by iDuanYingJie

  1. Paolo Angioletti's code can't get all the data, only the last batch of data.
  2. I think it might be better to use ListBuffer.
  3. This method does not support setting startAfterKey.
  1. Paolo Angioletti 的代码无法获取所有数据,只能获取最后一批数据。
  2. 我认为使用 ListBuffer 可能会更好。
  3. 此方法不支持设置 startAfterKey。
    import com.amazonaws.services.s3.AmazonS3Client
    import com.amazonaws.services.s3.model.{ObjectListing, S3ObjectSummary}    
    import scala.collection.JavaConverters._
    import scala.collection.mutable.ListBuffer

    def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T): List[T] = {

      def scan(acc: ListBuffer[T], listing: ObjectListing): List[T] = {
        val r = acc ++= listing.getObjectSummaries.asScala.map(f).toList
        if (listing.isTruncated) scan(r, s3.listNextBatchOfObjects(listing))
        else r.toList
      }

      scan(ListBuffer.empty[T], s3.listObjects(bucket, prefix))
    }

The second method is to use awssdk-v2

第二种方法是使用awssdk-v2

<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>s3</artifactId>
    <version>2.1.0</version>
</dependency>
  import software.amazon.awssdk.services.s3.S3Client
  import software.amazon.awssdk.services.s3.model.{ListObjectsV2Request, S3Object}

  import scala.collection.JavaConverters._

  def listObjects[T](s3: S3Client, bucket: String,
                     prefix: String, startAfter: String)(f: (S3Object) => T): List[T] = {
    val request = ListObjectsV2Request.builder()
      .bucket(bucket).prefix(prefix)
      .startAfter(startAfter).build()

    s3.listObjectsV2Paginator(request)
      .asScala
      .flatMap(_.contents().asScala)
      .map(f)
      .toList
  }

回答by Ori Popowski

In Scala:

在斯卡拉:

val first = s3.listObjects("bucket.new.test")

val listings: Seq[ObjectListing] = Iterator.iterate(Option(first))(_.flatMap(listing =>
  if (listing.isTruncated) Some(s3.listNextBatchOfObjects(listing))
  else None
))
  .takeWhile(_.nonEmpty)
  .toList
  .flatten