.net 如何在 Windows Azure 存储上查询 Cloud Blob
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14440506/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to query Cloud Blobs on Windows Azure Storage
提问by Gorgi Rankovski
I am using Microsoft.WindowsAzure.StorageClient to manipulate blobs on Azure storage. I have come to the point where the user needs to list the uploaded files and modify/delete them. Since there are many files in one container, what is the best way to query azure storage services to return only the desired files. Also, I would like to be able to return only specific number of blobs so I can implement paging.
我正在使用 Microsoft.WindowsAzure.StorageClient 来操作 Azure 存储上的 blob。我已经到了用户需要列出上传的文件并修改/删除它们的地步。由于一个容器中有很多文件,查询 Azure 存储服务以仅返回所需文件的最佳方法是什么。此外,我希望能够仅返回特定数量的 blob,以便我可以实现分页。
There is a method called ListBlobs in the CloudBlobContainer, but it seems like it's returning all of the blobs in the container. That will not work for me.
CloudBlobContainer 中有一个名为 ListBlobs 的方法,但它似乎返回了容器中的所有 blob。那对我不起作用。
I searched a lot on this topic and could not find anything useful. This linkshows only the basics.
我在这个主题上搜索了很多,但找不到任何有用的东西。此链接仅显示基础知识。
--------- EDIT
- - - - - 编辑
My answer below does not retrieve the blobs lazily, but it retrieves all of the blobs in the container and then filters the result. Currently there is no solution for retrieving blobs lazily.
我在下面的回答不会懒惰地检索 blob,而是检索容器中的所有 blob,然后过滤结果。目前还没有延迟检索 blob 的解决方案。
采纳答案by NathanAldenSr
What I've realized about Windows Azure blob storage is that it is bare-bones. As in extremelybare-bones. You should use it only to store documents and associated metadata and then retrieve individual blobs by ID.
我对 Windows Azure blob 存储的认识是它是基本的。就像在极其简单的情况下一样。您应该仅使用它来存储文档和关联的元数据,然后按 ID 检索单个 blob。
I recently migrated an application from MongoDB to Windows Azure blob storage. Coming from MongoDB, I was expecting a bunch of different efficient ways to retrieve documents. After migrating, I now rely on a traditional RDBMS and ElasticSearch to store blob information in a more searchable way.
我最近将一个应用程序从 MongoDB 迁移到 Windows Azure blob 存储。来自 MongoDB,我期待着一系列不同的高效方法来检索文档。迁移后,我现在依靠传统的 RDBMS 和 ElasticSearch 以更可搜索的方式存储 blob 信息。
It's really too bad that Windows Azure blob storage is so limiting. I hope to see much-enhanced searching capabilities in the future (e.g., search by metadata, property, blob name regex, etc.) Additionally, indexes based on map/reduce would be awesome. Microsoft has the chance to convert a lot of folks over from other document storage systems if they did these things.
Windows Azure blob 存储如此有限,实在是太糟糕了。我希望在未来看到更多增强的搜索功能(例如,按元数据、属性、blob 名称正则表达式等搜索)此外,基于 map/reduce 的索引会很棒。如果他们做了这些事情,微软就有机会从其他文档存储系统转换很多人。
回答by Gorgi Rankovski
The method ListBlobs retrieves the blobs in that container lazily. So you can write queries against that method that are not executed until you loop (or materialize objects with ToList or some other method) the list.
该方法ListBlobs检索该容器中的斑点懒惰地。因此,您可以针对该方法编写查询,这些查询在您循环(或使用 ToList 或其他方法实现对象)列表之前不会执行。
Things will get clearer with few examples. For those that don't know how to obtain a reference to a container in your Azure Storage Account, I recommend this tutorial.
通过几个例子,事情会变得更清楚。对于那些不知道如何获取对 Azure 存储帐户中的容器的引用的人,我推荐本教程。
Order by last modified date and take page number 2 (10 blobs per page):
按上次修改日期排序并取第 2 页(每页 10 个 blob):
blobContainer.ListBlobs().OfType<CloudBlob>()
.OrderByDescending(b=>b.Properties.LastModified).Skip(10).Take(10);
Get specific type of files. This will work if you have set ContentType at the time of upload (which I strongly recomend you do):
获取特定类型的文件。如果您在上传时设置了 ContentType(我强烈建议您这样做),这将起作用:
blobContainer.ListBlobs().OfType<CloudBlob>()
.Where(b=>b.Properties.ContentType.StartsWith("image"));
Get .jpg files and order them by file size, assuming you set file names with their extensions:
获取 .jpg 文件并按文件大小对它们进行排序,假设您使用扩展名设置文件名:
blobContainer.ListBlobs().OfType<CloudBlob>()
.Where(b=>b.Name.EndsWith(".jpg")).OrderByDescending(b=>b.Properties.Length);
At last, the query will not be executed until you tell it to:
最后,直到您告诉查询:
var blobs = blobContainer.ListBlobs().OfType<CloudBlob>()
.Where(b=>b.Properties.ContentType.StartsWith("image"));
foreach(var b in blobs) //This line will call the service,
//execute the query against it and
//return the desired files
{
// do something with each file. Variable b is of type CloudBlob
}
回答by Matt
Edit
编辑
Now in preview is blob index for azure storagewhich is a managed index of metadata you can add to your blobs (new or existing). This will remove the need to use creative container names for pseudo indexing or to maintain a secondary index yourself.
现在预览版是用于 azure 存储的 blob 索引,它是可以添加到 blob(新的或现有的)的元数据的托管索引。这将消除使用创造性容器名称进行伪索引或自己维护二级索引的需要。
Original answer
原答案
For returning specific results, one possible option is to use the blob and/or container prefix to effectively index what you're storing. For example you could prefix a date and time as you add blobs, or you could prefix a user, depends on your use case as to how you'd want to "index" your blobs. You can then use this prefix or a part of it in the ListBlobs[Segmented] call to return specific results, obviously you'd need to put the most general elements first, then more specific elements, e.g.:
为了返回特定结果,一种可能的选择是使用 blob 和/或容器前缀来有效地索引您存储的内容。例如,您可以在添加 blob 时为日期和时间添加前缀,也可以为用户添加前缀,这取决于您希望如何“索引”blob 的用例。然后,您可以在 ListBlobs[Segmented] 调用中使用此前缀或其中的一部分来返回特定结果,显然您需要首先放置最通用的元素,然后是更具体的元素,例如:
2016_03_15_10_15_blobname
This would allow you to get all 2016 blobs, or March 2016 blobs, etc. but not March blobs in any year without multiple calls.
这将允许您在没有多次调用的情况下获得所有 2016 年的 blob,或 2016 年 3 月的 blob 等,但不能在任何一年中获取 3 月的 blob。
Downside with this is that if you needed to re-index blobs you'd need to delete and recreate them with a new name.
这样做的缺点是,如果您需要重新索引 blob,则需要删除它们并使用新名称重新创建它们。
For paging generally you can use the ListBlobsSegmented method which will give you a continuation token that you can use to implement paging. That said it's not much use if you need to skip pages as it only works by starting from where the last set of actual results left off. One option with this is to calculate the number of pages you need to skip, get these and discard them, then get the actual page you want. If you have a lot of blobs in each container this could get pretty inefficient pretty quickly....
对于分页,您通常可以使用 ListBlobsSegmented 方法,该方法将为您提供可用于实现分页的延续令牌。也就是说,如果您需要跳过页面,它没有多大用处,因为它只能从最后一组实际结果停止的地方开始工作。一种选择是计算您需要跳过的页面数,获取并丢弃它们,然后获取您想要的实际页面。如果每个容器中有很多斑点,这可能很快就会变得非常低效......
You could also just have this as the fail back method, using a page by page approach and storing the continuation token if the user is clicking one page to the next sequentially OR you could potentially cache blob names and do your own paging from that.
您也可以将此作为故障回复方法,使用逐页方法并存储继续令牌,如果用户按顺序单击一个页面到下一个页面,或者您可能会缓存 blob 名称并从中进行自己的分页。
You can also combine these two approaches, e.g. filtering by your "index" then paging on the results.
您也可以结合这两种方法,例如通过“索引”过滤然后对结果进行分页。
回答by Jason Steele
Azure Data Lake Gen 2 will support data stored in the Data Lake to be searched using USQL. Blob storage APIs can be used to store and retrieve that data.
Azure Data Lake Gen 2 将支持使用 USQL 搜索存储在 Data Lake 中的数据。Blob 存储 API 可用于存储和检索该数据。

