未指定排序顺序时,MongoDB 如何对记录进行排序?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11599069/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does MongoDB sort records when no sort order is specified?
提问by saurabhj
When we run a Mongo find() query without any sort order specified, what does the database internally use to sort the results?
当我们在没有指定任何排序顺序的情况下运行 Mongo find() 查询时,数据库内部使用什么来对结果进行排序?
According to the documentation on the mongo website:
When executing a find() with no parameters, the database returns objects in forward natural order.
For standard tables, natural order is not particularly useful because, although the order is often close to insertion order, it is not guaranteed to be. However, for Capped Collections, natural order is guaranteed to be the insertion order. This can be very useful.
当执行不带参数的 find() 时,数据库以正向自然顺序返回对象。
对于标准表,自然顺序不是特别有用,因为尽管顺序通常接近插入顺序,但不能保证确实如此。但是,对于 Capped Collections,自然顺序保证是插入顺序。这可能非常有用。
However for standard collections (non capped collections), what field is used to sort the results? Is it the _idfield or something else?
但是,对于标准集合(非上限集合),使用哪个字段对结果进行排序?它是_id字段还是其他什么?
Edit:
编辑:
Basically, I guess what I am trying to get at is that if I execute the following search query:
基本上,我想我想知道的是,如果我执行以下搜索查询:
db.collection.find({"x":y}).skip(10000).limit(1000);
At two different points in time: t1and t2, will I get different result sets:
在两个不同的时间点:t1和t2,我会得到不同的结果集:
- When there have been no additional writes between t1 & t2?
- When there have been new writes between t1 & t2?
- There are new indexes that have been added between t1 & t2?
- 当 t1 和 t2 之间没有额外的写入时?
- 当 t1 和 t2 之间有新的写入时?
- 在 t1 和 t2 之间添加了新索引?
I have run some tests on a temp database and the results I have gotten are the same (Yes) for all the 3 cases - but I wanted to be sure and I am certain that my test cases weren't very thorough.
我已经在临时数据库上运行了一些测试,对于所有 3 个案例,我得到的结果都是相同的(是) - 但我想确定并且我确信我的测试案例不是很彻底。
回答by Stennie
What is the default sort order when none is specified?
未指定时的默认排序顺序是什么?
The default internal sort order (or natural order) is an undefinedimplementation detail. Maintaining order is extra overhead for storage engines and MongoDB's API does not mandate predictability outside of an explicit sort()
or the special case of fixed-sized capped collectionswhich have associated usage restrictions. For typical workloads it is desirable for the storage engine to try to reuse available preallocated space and make decisions about how to most efficiently store data on disk and in memory.
默认的内部排序顺序(或自然顺序)是一个未定义的实现细节。维护顺序是存储引擎的额外开销,并且 MongoDB 的 API 不强制要求可预测性,除非有相关使用限制sort()
的固定大小的上限集合的显式或特殊情况。对于典型的工作负载,存储引擎最好尝试重用可用的预分配空间,并决定如何最有效地将数据存储在磁盘和内存中。
Without any query criteria, results will be returned by the storage engine in natural order(aka in the order they are found). Result order may coincide with insertion order but this behaviour is not guaranteed and cannot be relied on (aside from capped collections).
在没有任何查询条件的情况下,结果将由存储引擎以自然顺序(也就是按照它们被找到的顺序)返回。结果顺序可能与插入顺序一致,但不能保证且不能依赖此行为(除了上限集合)。
Some examples that may affect storage (natural) order:
一些可能影响存储(自然)顺序的示例:
- WiredTiger uses a different representation of documents on disk versus the in-memory cache, so natural ordering may change based on internal data structures.
- The original MMAPv1 storage engine (removed in MongoDB 4.2) allocates record space for documents based on padding rules. If a document outgrows the currently allocated record space, the document location (and natural ordering) will be affected. New documents can also be inserted in storage marked available for reuse due to deleted or moved documents.
- Replication uses an idempotent oplogformat to apply write operations consistently across replica set members. Each replica set member maintains local data files that can vary in natural order, but will have the same data outcome when oplog updates are applied.
- WiredTiger 在磁盘和内存缓存中使用不同的文档表示,因此自然排序可能会根据内部数据结构而改变。
- 原始的 MMAPv1 存储引擎(在 MongoDB 4.2 中删除)根据填充规则为文档分配记录空间。如果文档超出当前分配的记录空间,文档位置(和自然排序)将受到影响。由于删除或移动了文档,新文档也可以插入到标记为可供重复使用的存储中。
- 复制使用幂等 oplog格式在副本集成员之间一致地应用写入操作。每个副本集成员维护本地数据文件,这些文件可以按自然顺序变化,但在应用 oplog 更新时将具有相同的数据结果。
What if an index is used?
如果使用索引怎么办?
If an index is used, documents will be returned in the order they are found (which does necessarily match insertion order or I/O order). If more than one index is used then the order depends internally on which index first identified the document during the de-duplication process.
如果使用索引,文档将按照它们被找到的顺序返回(这必然匹配插入顺序或 I/O 顺序)。如果使用了多个索引,则顺序在内部取决于在重复数据删除过程中哪个索引首先标识了文档。
If you want a predictable sort order you mustinclude an explicit sort()
with your query and have unique values for your sort key.
如果您想要一个可预测的排序顺序,您必须sort()
在查询中包含一个显式,并为您的排序键提供唯一值。
How do capped collections maintain insertion order?
上限集合如何维护插入顺序?
The implementation exception noted for natural order in capped collections is enforced by their special usage restrictions: documents are stored in insertion order but existing document size cannot be increased and documents cannot be explicitly deleted. Ordering is part of the capped collection design that ensures the oldest documents "age out" first.
上限集合中自然顺序的实现例外由它们的特殊使用限制强制执行:文档按插入顺序存储,但现有文档大小不能增加,文档不能显式删除。排序是上限集合设计的一部分,可确保最旧的文档首先“过期”。
回答by Parvin Gasimzade
It is returned in the stored order (order in the file), but it is not guaranteed to be that they are in the inserted order. They are not sorted by the _id field. Sometimes it can be look like it is sorted by the insertion order but it can change in another request. It is not reliable.
它以存储顺序(文件中的顺序)返回,但不保证它们是插入顺序。它们不按 _id 字段排序。有时它看起来像是按插入顺序排序,但它可以在另一个请求中更改。它不可靠。