Java AWS S3 - 列出文件夹内没有前缀的所有对象

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23217951/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 21:23:47  来源:igfitidea点击:

AWS S3 - Listing all objects inside a folder without the prefix

javaamazon-web-servicesamazon-s3

提问by Marz

I'm having problems retrieving all objects(filenames) inside a folder in AWS S3. Here's my code:

我在检索 AWS S3 文件夹内的所有对象(文件名)时遇到问题。这是我的代码:

ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
            .withBucketName(bucket)
            .withPrefix(folderName + "/")
            .withMarker(folderName + "/")

    ObjectListing objectListing = amazonWebService.s3.listObjects(listObjectsRequest)

    for (S3ObjectSummary summary : objectListing.getObjectSummaries()) {
        print summary.getKey()
    }

It returns the correct object but with the prefix in it, e.g. foldename/filename

它返回正确的对象,但带有前缀,例如文件夹名/文件名

I know I can just use java perhaps substring to exclude the prefix but I just wanted to know if there is a method for it in AWS SDK.

我知道我可以只使用 java 或者 substring 来排除前缀,但我只是想知道 AWS SDK 中是否有针对它的方法。

采纳答案by Dan Ciborowski - MSFT

There is not. Linked is a list of all the methods that are available. The reason behind this is the S3 design. S3 does not have "subfolders". Instead it is simply a list of files, where the filename is the "prefix" plus the filename you desire. The GUI shows the data similar to windows stored in "folders", but there is not folder logic present in S3.

那没有。Linked 是所有可用方法的列表。这背后的原因是 S3 设计。S3 没有“子文件夹”。相反,它只是一个文件列表,其中文件名是“前缀”加上您想要的文件名。GUI 显示的数据类似于存储在“文件夹”中的窗口,但 S3 中不存在文件夹逻辑。

http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/S3ObjectSummary.html

http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/S3ObjectSummary.html

Your best bet is to split by "/" and to take the last object in the array.

最好的办法是用“/”分割并取数组中的最后一个对象。

回答by Paolo Angioletti

For Scala developers, here it is recursive function to execute a full scan and mapof the contents of an AmazonS3 bucket using the official AWS SDK for Java

对于 Scala 开发人员,这里是使用官方AWS SDK for Java执行 AmazonS3 存储桶内容的完整扫描和映射的递归函数

import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.{S3ObjectSummary, ObjectListing, GetObjectRequest}
import scala.collection.JavaConversions.{collectionAsScalaIterable => asScala}

def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {

  def scan(acc:List[T], listing:ObjectListing): List[T] = {
    val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())
    val mapped = (for (summary <- summaries) yield f(summary)).toList

    if (!listing.isTruncated) mapped.toList
    else scan(acc ::: mapped, s3.listNextBatchOfObjects(listing))
  }

  scan(List(), s3.listObjects(bucket, prefix))
}

To invoke the above curried map()function, simply pass the already constructed (and properly initialized) AmazonS3Client object (refer to the official AWS SDK for Java API Reference), the bucket name and the prefix name in the first parameter list. Also pass the function f()you want to apply to map each object summary in the second parameter list.

要调用上述柯里化map()函数,只需在第一个参数列表中传递已经构建(并正确初始化)的 AmazonS3Client 对象(请参阅官方AWS SDK for Java API 参考)、存储桶名称和前缀名称。还要传递f()要应用的函数来映射第二个参数列表中的每个对象摘要。

For example

例如

map(s3, bucket, prefix) { s => println(s.getKey.split("/")(1)) }

will print all the filenames (without the prefix)

将打印所有文件名(不带前缀)

val tuple = map(s3, bucket, prefix)(s => (s.getKey, s.getOwner, s.getSize))

will return the full list of (key, owner, size)tuples in that bucket/prefix

将返回该(key, owner, size)桶/前缀中元组的完整列表

val totalSize = map(s3, "bucket", "prefix")(s => s.getSize).sum

will return the total size of its content (note the additional sum()folding function applied at the end of the expression ;-)

将返回其内容的总大小(注意sum()在表达式末尾应用的附加折叠函数;-)

You can combine map()with many other functions as you would normally approach by Monads in Functional Programming

您可以像在函数式编程中map()Monads通常使用的那样与许多其他函数结合使用

回答by Joshua

Just to follow up on the comment above - "here it is recursive function to execute a full scan and map" - there is a bug in the code (as @Eric highlighted) if there are more than 1000 keys in the bucket. The fix is actually quite simple, the mapped.toList needs to be merged with acc.

只是为了跟进上面的评论 - “这里是执行完整扫描和映射的递归函数” - 如果存储桶中有超过 1000 个键,则代码中存在错误(如@Eric 突出显示)。修复其实很简单,mapped.toList 需要和acc 合并。

def map[T](s3: AmazonS3Client, bucket: String, prefix: String)(f: (S3ObjectSummary) => T) = {

  def scan_s3_bucket(acc:List[T], listing:ObjectListing): List[T] = {
    val summaries = asScala[S3ObjectSummary](listing.getObjectSummaries())
    val mapped = (for (summary <- summaries) yield f(summary)).toList

    if (!listing.isTruncated) {
      acc ::: mapped.toList
    } else {
      println("list extended, more to go: new_keys '%s', current_length '%s'".format(mapped.length, acc.length))
      scan_s3_bucket(acc ::: mapped, s3.listNextBatchOfObjects(listing))
    }
  }

  scan_s3_bucket(List(), s3.listObjects(bucket, prefix))
}

回答by kdblue

This code help me to find sub-directory of my bucket.

此代码帮助我找到我的存储桶的子目录。

Example :- "Testing" is a my bucket name , inside that contain "[email protected] " folder then its contain "IMAGE" folder in which contain image files.

示例:-“Testing”是我的存储桶名称,其中包含“[email protected]”文件夹,然后它包含“IMAGE”文件夹,其中包含图像文件。

     ArrayList<String> transferRecord = new ArrayList<>();    

     ListObjectsRequest listObjectsRequest =
                            new ListObjectsRequest()
                                    .withBucketName(Constants.BUCKET_NAME)
                                    .withPrefix("[email protected]" + "/IMAGE");

      ObjectListing objects = s3.listObjects(listObjectsRequest);
        for (;;) {
                    List<S3ObjectSummary> summaries = 
                    objects.getObjectSummaries();
                        if (summaries.size() < 1) {
                            break;
                        }

                       for(int i=0;i<summaries.size();i++){
                            ArrayList<String> file = new ArrayList<>();

                            file.add(summaries.get(i).getKey());
                            transferRecord.add(file);
                        }

                        objects = s3.listNextBatchOfObjects(objects);
               }

I hope this helps you.

我希望这可以帮助你。

回答by randhirkr

Below snipped worked quite well for me. Reference: https://codeflex.co/get-list-of-objects-from-s3-directory/

下面的剪辑对我来说效果很好。参考:https: //codeflex.co/get-list-of-objects-from-s3-directory/

    List<String> getObjectslistFromFolder(String bucketName, String folderKey, AmazonS3 s3Client) {

    ListObjectsRequest listObjectsRequest = new ListObjectsRequest().withBucketName(bucketName)
            .withPrefix(folderKey + "/");

    List<String> keys = new ArrayList<String>();

    ObjectListing objects = s3Client.listObjects(listObjectsRequest);
    for (;;) {
        List<S3ObjectSummary> summaries = objects.getObjectSummaries();
        if (summaries.size() < 1) {
            break;
        }

        // summaries.forEach(s -> keys.add(s.getKey()));
        // changed project compliance to jre 1.8
        summaries.forEach(s -> keys.add(s.getKey()));

        objects = s3Client.listNextBatchOfObjects(objects);
    }

    return keys;