database 您如何查询 DynamoDB?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9131191/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do you query DynamoDB?
提问by chriserwin
I'm looking at Amazon's DynamoDB as it looks like it takes away all of the hassle of maintaining and scaling your database server. I'm currently using MySQL, and maintaining and scaling the database is a complete headache.
我正在研究 Amazon 的 DynamoDB,因为它看起来消除了维护和扩展数据库服务器的所有麻烦。我目前正在使用 MySQL,维护和扩展数据库是一个令人头疼的问题。
I've gone through the documentation and I'm having a hard time trying to wrap my head around how you would structure your data so it could be easily retrieved.
我已经浏览了文档,但我很难弄清楚如何构建数据以便轻松检索。
I'm totally new to NoSQL and non-relational databases.
我对 NoSQL 和非关系数据库完全陌生。
From the Dynamo documentation it sounds like you can only query a table on the primary hash key, and the primary range key with a limited number of comparison operators.
从 Dynamo 文档看来,您只能在主哈希键和主范围键上查询表,并使用有限数量的比较运算符。
Or you can run a full table scan and apply a filter to it. The catch is that it will only scan 1Mb at a time, so you'd likely have to repeat your scan to find X number of results.
或者您可以运行全表扫描并对其应用过滤器。问题是它一次只能扫描 1Mb,因此您可能必须重复扫描才能找到 X 个结果。
I realize these limitations allow them to provide predictable performance, but it seems like it makes it really difficult to get your data out. And performing full table scans seemslike it would be really inefficient, and would only become less efficient over time as your table grows.
我意识到这些限制使它们能够提供可预测的性能,但似乎很难获取数据。执行全表扫描似乎效率很低,而且随着表的增长,效率只会随着时间的推移而降低。
For Instance, say I have a Flickr clone. My Images table might look something like:
例如,假设我有一个 Flickr 克隆。我的图像表可能类似于:
- Image ID (Number, Primary Hash Key)
- Date Added (Number, Primary Range Key)
- User ID (String)
- Tags (String Set)
- etc
- 图像 ID(编号、主哈希键)
- 添加日期(数字,主要范围键)
- 用户 ID(字符串)
- 标签(字符串集)
- 等等
So using query I would be able to list all images from the last 7 days and limit it to X number of results pretty easily.
因此,使用查询我将能够列出过去 7 天的所有图像,并将其限制为 X 个结果很容易。
But if I wanted to list all images from a particular user I would need to do a full table scan and filter by username. Same would go for tags.
但是如果我想列出来自特定用户的所有图像,我需要进行全表扫描并按用户名过滤。标签也一样。
And because you can only scan 1Mb at a time you may need to do multiple scans to find X number of images. I also don't see a way to easily stop at X number of images. If you're trying to grab 30 images, your first scan might find 5, and your second may find 40.
并且因为您一次只能扫描 1Mb,您可能需要进行多次扫描才能找到 X 个图像。我也没有看到一种方法可以轻松地停在 X 个图像处。如果您尝试抓取 30 张图像,您的第一次扫描可能会找到 5 张,第二次可能会找到 40 张。
Do I have this right? Is it basically a trade-off? You get really fast predictable database performance that is virtually maintenance free. But the trade-off is that you need to build way more logic to deal with the results?
我有这个权利吗?这基本上是一种权衡吗?您可以获得几乎无需维护的非常快速且可预测的数据库性能。但权衡是你需要建立更多的逻辑来处理结果?
Or am I totally off base here?
还是我完全不在基地?
采纳答案by DNA
Yes, you are correct about the trade-off between performance and query flexibility.
是的,您对性能和查询灵活性之间的权衡是正确的。
But there are a few tricks to reduce the pain - secondary indexes/denormalising probably being the most important.
但是有一些技巧可以减轻痛苦——次要指标/非规范化可能是最重要的。
You would have another table keyed on user ID, listing all their images, for example. When you add an image, you update this table as well as adding a row to the table keyed on image ID.
例如,您将有另一个以用户 ID 为键的表,列出他们的所有图像。添加图像时,您会更新此表并向以图像 ID 为键的表中添加一行。
You have to decide what queries you need, then design the data model around them.
你必须决定你需要什么查询,然后围绕它们设计数据模型。
回答by Rodrigo Ribeiro
I think you need create your own secondary index, using another table.
我认为您需要使用另一个表创建自己的二级索引。
This table "schema" could be:
这个表“模式”可以是:
User ID (String, Primary Key)
Date Added (Number, Range Key)
Image ID (Number)
--
——
That way you can query by User ID and filter by Date as well
这样您就可以按用户 ID 查询并按日期过滤
回答by Tamas Kalman
You can use composite hash-range keyas primary index.
您可以使用复合哈希范围键作为主索引。
From the DynamoDB Page:
从 DynamoDB 页面:
A primary key can either be a single-attribute hash key or a composite hash-range key. A single attribute hash primary key could be, for example, “UserID”. This would allow you to quickly read and write data for an item associated with a given user ID.
A composite hash-range key is indexed as a hash key element and a range key element. This multi-part key maintains a hierarchy between the first and second element values. For example, a composite hash-range key could be a combination of “UserID” (hash) and “Timestamp” (range). Holding the hash key element constant, you can search across the range key element to retrieve items. This would allow you to use the Query API to, for example, retrieve all items for a single UserID across a range of timestamps.
主键可以是单属性散列键或复合散列范围键。例如,单个属性散列主键可以是“UserID”。这将允许您快速读取和写入与给定用户 ID 关联的项目的数据。
复合散列范围键被索引为散列键元素和范围键元素。这个多部分键维护第一个和第二个元素值之间的层次结构。例如,复合散列范围键可以是“UserID”(散列)和“Timestamp”(范围)的组合。保持哈希键元素不变,您可以搜索范围键元素以检索项目。例如,这将允许您使用 Query API 检索跨时间戳范围内单个 UserID 的所有项目。

