无法在 mongodb 中创建索引,“键太大无法索引”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27792706/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Cannot create index in mongodb, "key too large to index"
提问by Sandeep.maurya
I am creating index in mongodb having 10 million records but following error
我在 mongodb 中创建索引有 1000 万条记录但出现以下错误
db.logcollection.ensureIndex({"Module":1})
{
"createdCollectionAutomatically" : false,
"numIndexesBefore" : 3,
"ok" : 0,
"errmsg" : "Btree::insert: key too large to index, failing play.logcollection.$Module_1 1100 { : \"RezGainUISystem.Net.WebException: The request was aborted: The request was canceled.\r\n at System.Net.ConnectStream.InternalWrite(Boolean async, Byte...\" }",
"code" : 17282
}
Please help me how to createindex in mongodb,
请帮助我如何在 mongodb 中创建索引,
回答by anhlc
MongoDB will not create an index on a collection if the index entry for an existing document exceeds the index key limit(1024 bytes). You can however create a hashed indexor text indexinstead:
如果现有文档的索引条目超过索引键限制(1024 字节),MongoDB 将不会在集合上创建索引。但是,您可以改为创建散列索引或文本索引:
db.logcollection.createIndex({"Module":"hashed"})
or
或者
db.logcollection.createIndex({"Module":"text"})
回答by Rustem K
You can silent this behaviour by launching mongod instance with the following command:
您可以通过使用以下命令启动 mongod 实例来消除此行为:
mongod --setParameter failIndexKeyTooLong=false
or by executing the following command from mongoShell
或者通过从 mongoShell 执行以下命令
db.getSiblingDB('admin').runCommand( { setParameter: 1, failIndexKeyTooLong: false } )
If you ensured that your field will exceed the limit very rarely, then
one way to solve this issue is by splitting your field (that causes index out of limit) into parts by byte length < 1KB e.g. for field val
I would split it into tuple of fields val_1
, val_2
and so on. Mongo stores text as utf-8 valid values. It means that you need a function that can split utf-8 strings properly.
如果您确保您的字段很少超过限制,那么解决此问题的一种方法是将您的字段(导致索引超出限制)按字节长度 < 1KBval
拆分为部分,例如对于字段,我会将其拆分为元组领域val_1
,val_2
等等。Mongo 将文本存储为 utf-8 有效值。这意味着您需要一个可以正确拆分 utf-8 字符串的函数。
def split_utf8(s, n):
"""
(ord(s[k]) & 0xc0) == 0x80 - checks whether it is continuation byte (actual part of the string) or jsut header indicates how many bytes there are in multi-byte sequence
An interesting aside by the way. You can classify bytes in a UTF-8 stream as follows:
With the high bit set to 0, it's a single byte value.
With the two high bits set to 10, it's a continuation byte.
Otherwise, it's the first byte of a multi-byte sequence and the number of leading 1 bits indicates how many bytes there are in total for this sequence (110... means two bytes, 1110... means three bytes, etc).
"""
s = s.encode('utf-8')
while len(s) > n:
k = n
while (ord(s[k]) & 0xc0) == 0x80:
k -= 1
yield s[:k]
s = s[k:]
yield s
Then you can define your compound index:
然后你可以定义你的复合索引:
db.coll.ensureIndex({val_1: 1, val_2: 1, ...}, {background: true})
or multiple indexes per each val_i
:
或每个多个索引val_i
:
db.coll.ensureIndex({val_1: 1}, {background: true})
db.coll.ensureIndex({val_1: 2}, {background: true})
...
db.coll.ensureIndex({val_1: i}, {background: true})
Important: If you consider using your field in compound index, then be careful with the second argument for
split_utf8
function. At each document you need to remove sum of bytes of each field value that comprise your index key e.g. for index (a:1, b:1, val: 1)1024 - sizeof(value(a)) - sizeof(value(b))
重要提示:如果您考虑在复合索引中使用您的字段,请注意
split_utf8
函数的第二个参数。在每个文档中,您需要删除构成索引键的每个字段值的字节总和,例如索引 (a:1, b:1, val: 1)1024 - sizeof(value(a)) - sizeof(value(b))
回答by kevinadi
As different people has pointed out in the answers, the error key too large to index
means that you are attempting to create an index on field or fields that exceeds 1024 bytes in length.
正如不同的人在答案中指出的那样,该错误key too large to index
意味着您正在尝试在长度超过 1024 字节的一个或多个字段上创建索引。
In ASCII terms, 1024 bytes typically translates to around 1024 characters in length.
在 ASCII 术语中,1024 字节通常转换为大约 1024 个字符的长度。
There is no solutionfor this, as this is an intrinsic limit set by MongoDB as mentioned in MongoDB Limits and Thresholds page:
对此没有解决方案,因为这是MongoDB 限制和阈值页面中提到的MongoDB设置的内在限制:
The total size of an index entry, which can include structural overhead depending on the BSON type, must be less than 1024 bytes.
索引条目的总大小(可能包括取决于 BSON 类型的结构开销)必须小于 1024 字节。
Turning on the failIndexKeyTooLong
error is not a solution, as mentioned in the server parameters manual page:
打开failIndexKeyTooLong
错误不是解决方案,如服务器参数手册页中所述:
...these operations would successfully insert or modify a document but the index or indexes would not include references to the document.
...这些操作将成功插入或修改文档,但一个或多个索引不包括对文档的引用。
What that sentence means is that the offending document will not be included in the index, and may be missing from query results.
这句话的意思是违规文档不会包含在索引中,并且可能会从查询结果中丢失。
For example:
例如:
> db.test.insert({_id: 0, a: "abc"})
> db.test.insert({_id: 1, a: "def"})
> db.test.insert({_id: 2, a: <string more than 1024 characters long>})
> db.adminCommand( { setParameter: 1, failIndexKeyTooLong: false } )
> db.test.find()
{"_id": 0, "a": "abc"}
{"_id": 1, "a": "def"}
{"_id": 2, "a": <string more than 1024 characters long>}
Fetched 3 record(s) in 2ms
> db.test.find({a: {$ne: "abc"}})
{"_id": 1, "a": "def"}
Fetched 1 record(s) in 1ms
By forcing MongoDB to ignore the failIndexKeyTooLong
error, the last query does not contain the offending document(i.e. the document with _id: 2
is missing from the result), thus the query resulted in the wrong result set.
通过强制 MongoDB 忽略failIndexKeyTooLong
错误,最后一个查询不包含有问题的文档(即_id: 2
结果中缺少的文档),因此查询导致错误的结果集。
回答by JoelABair
When running into the "index key limit", the solution depends on the needs of your schema. In extremely rare cases, key matching on an value of > 1024 bytes is a design requirement. In fact, nearly all Databases impose an index key limit restriction, yet typically somewhat configurable in legacy relational DBs (Oracle/MySQL/PostgreSQL), so that you can easily shoot yourself in the foot.
当遇到“索引键限制”时,解决方案取决于您的架构的需求。在极少数情况下,大于 1024 字节的值的键匹配是设计要求。事实上,几乎所有数据库都施加了索引键限制限制,但通常在传统关系数据库 (Oracle/MySQL/PostgreSQL) 中有些可配置,因此您可以轻松地将自己打倒。
For quick search, a "text" index is designed to optimize searching and pattern matching on long text fields, and is well suited to the use case. However, more commonly, a uniqueness constraint on long text valuesis a requirement. And "text" indexes do not behave as does a unique scalar value with the unique flag set
{ unique: true }
(more like an array of all the text strings in the field).
对于快速搜索,“文本”索引旨在优化对长文本字段的搜索和模式匹配,非常适合用例。但是,更常见的是,要求对长文本值进行唯一性约束。并且“文本”索引的行为不像设置了唯一标志的唯一标量值
{ unique: true }
(更像是字段中所有文本字符串的数组)。
Taking inspiration from MongoDb's GridFS, uniqueness checks can easily be implemented by adding a "md5" field to the document and creating a unique scalar index on that. Sort of like a custom unique hashed index. This allows a virtually unlimited (~ 16mb) text field length, that is indexed for search and unique across the collection.
从 MongoDb 的 GridFS 中汲取灵感,可以通过向文档添加“md5”字段并在其上创建唯一标量索引来轻松实现唯一性检查。有点像自定义的唯一散列索引。这允许几乎无限(~ 16mb)的文本字段长度,它被索引以用于搜索并且在整个集合中是唯一的。
const md5 = require('md5');
const mongoose = require('mongoose');
let Schema = new mongoose.Schema({
text: {
type: String,
required: true,
trim: true,
set: function(v) {
this.md5 = md5(v);
return v;
}
},
md5: {
type: String,
required: true,
trim: true
}
});
Schema.index({ md5: 1 }, { unique: true });
Schema.index({ text: "text" }, { background: true });
回答by MalcolmOcean
In my case I was trying to index on a large subdocument array, and when I went and looked at my query the query was actually for a subproperty of a subproperty, so I changed the index to focus on said subsubproperty and it worked okay.
在我的例子中,我试图在一个大的子文档数组上建立索引,当我去查看我的查询时,查询实际上是针对子属性的子属性,所以我改变了索引以专注于所述子子属性并且它工作正常。
In my case, goals
was the large subdocument array, the failing "key too large" index looked like {"goals": 1, "emailsDisabled": 1, "priorityEmailsDisabled": 1}
and the query looked like this:
在我的例子中,goals
是大型子文档数组,失败的“键太大”索引看起来像这样{"goals": 1, "emailsDisabled": 1, "priorityEmailsDisabled": 1}
,查询看起来像这样:
emailsDisabled: {$ne: true},
priorityEmailsDisabled: {$ne: true},
goals: {
$elemMatch: {
"topPriority.ymd": ymd,
}
}
and once I changed the index to be {"goals.topPriority.ymd": 1, "emailsDisabled": 1, "priorityEmailsDisabled": 1}
it worked fine.
一旦我将索引更改为{"goals.topPriority.ymd": 1, "emailsDisabled": 1, "priorityEmailsDisabled": 1}
它工作正常。
To be clear, all that I'm certain has worked here is that it allowed me to create the index. The question of whether that index works for that query is a separate one that I have not yet answered.
需要明确的是,我确定在这里工作的全部内容是它允许我创建索引。该索引是否适用于该查询是一个我尚未回答的单独问题。