无法在 mongodb 中创建索引,“键太大无法索引”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27792706/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 20:15:24  来源:igfitidea点击:

Cannot create index in mongodb, "key too large to index"

mongodbmongodb-query

提问by Sandeep.maurya

I am creating index in mongodb having 10 million records but following error

我在 mongodb 中创建索引有 1000 万条记录但出现以下错误

db.logcollection.ensureIndex({"Module":1})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 3,
        "ok" : 0,
        "errmsg" : "Btree::insert: key too large to index, failing play.logcollection.$Module_1 1100 { : \"RezGainUISystem.Net.WebException: The request was aborted: The request was canceled.\r\n   at System.Net.ConnectStream.InternalWrite(Boolean async, Byte...\" }",
        "code" : 17282
}

Please help me how to createindex in mongodb,

请帮助我如何在 mongodb 中创建索引,

回答by anhlc

MongoDB will not create an index on a collection if the index entry for an existing document exceeds the index key limit(1024 bytes). You can however create a hashed indexor text indexinstead:

如果现有文档的索引条目超过索引键限制(1024 字节),MongoDB 将不会在集合上创建索引。但是,您可以改为创建散列索引文本索引

db.logcollection.createIndex({"Module":"hashed"})

or

或者

db.logcollection.createIndex({"Module":"text"})

回答by Rustem K

You can silent this behaviour by launching mongod instance with the following command:

您可以通过使用以下命令启动 mongod 实例来消除此行为:

mongod --setParameter failIndexKeyTooLong=false

or by executing the following command from mongoShell

或者通过从 mongoShell 执行以下命令

db.getSiblingDB('admin').runCommand( { setParameter: 1, failIndexKeyTooLong: false } )

If you ensured that your field will exceed the limit very rarely, then one way to solve this issue is by splitting your field (that causes index out of limit) into parts by byte length < 1KB e.g. for field valI would split it into tuple of fields val_1, val_2and so on. Mongo stores text as utf-8 valid values. It means that you need a function that can split utf-8 strings properly.

如果您确保您的字段很少超过限制,那么解决此问题的一种方法是将您的字段(导致索引超出限制)按字节长度 < 1KBval拆分为部分,例如对于字段,我会将其拆分为元组领域val_1val_2等等。Mongo 将文本存储为 utf-8 有效值。这意味着您需要一个可以正确拆分 utf-8 字符串的函数。

   def split_utf8(s, n):
    """
    (ord(s[k]) & 0xc0) == 0x80 - checks whether it is continuation byte (actual part of the string) or jsut header indicates how many bytes there are in multi-byte sequence

    An interesting aside by the way. You can classify bytes in a UTF-8 stream as follows:

    With the high bit set to 0, it's a single byte value.
    With the two high bits set to 10, it's a continuation byte.
    Otherwise, it's the first byte of a multi-byte sequence and the number of leading 1 bits indicates how many bytes there are in total for this sequence (110... means two bytes, 1110... means three bytes, etc).
    """
    s = s.encode('utf-8')
    while len(s) > n:
        k = n
        while (ord(s[k]) & 0xc0) == 0x80:
            k -= 1
        yield s[:k]
        s = s[k:]
    yield s

Then you can define your compound index:

然后你可以定义你的复合索引:

db.coll.ensureIndex({val_1: 1, val_2: 1, ...}, {background: true})

or multiple indexes per each val_i:

或每个多个索引val_i

db.coll.ensureIndex({val_1: 1}, {background: true})
db.coll.ensureIndex({val_1: 2}, {background: true})
...
db.coll.ensureIndex({val_1: i}, {background: true})

Important: If you consider using your field in compound index, then be careful with the second argument for split_utf8function. At each document you need to remove sum of bytes of each field value that comprise your index key e.g. for index (a:1, b:1, val: 1) 1024 - sizeof(value(a)) - sizeof(value(b))

重要提示:如果您考虑在复合索引中使用您的字段,请注意split_utf8函数的第二个参数。在每个文档中,您需要删除构成索引键的每个字段值的字节总和,例如索引 (a:1, b:1, val: 1)1024 - sizeof(value(a)) - sizeof(value(b))

In any other cases use either hashor textindexes.

在任何其他情况下,使用哈希文本索引。

回答by kevinadi

As different people has pointed out in the answers, the error key too large to indexmeans that you are attempting to create an index on field or fields that exceeds 1024 bytes in length.

正如不同的人在答案中指出的那样,该错误key too large to index意味着您正在尝试在长度超过 1024 字节的一个或多个字段上创建索引。

In ASCII terms, 1024 bytes typically translates to around 1024 characters in length.

在 ASCII 术语中,1024 字节通常转换为大约 1024 个字符的长度。

There is no solutionfor this, as this is an intrinsic limit set by MongoDB as mentioned in MongoDB Limits and Thresholds page:

对此没有解决方案,因为这是MongoDB 限制和阈值页面中提到的MongoDB设置的内在限制:

The total size of an index entry, which can include structural overhead depending on the BSON type, must be less than 1024 bytes.

索引条目的总大小(可能包括取决于 BSON 类型的结构开销)必须小于 1024 字节。

Turning on the failIndexKeyTooLongerror is not a solution, as mentioned in the server parameters manual page:

打开failIndexKeyTooLong错误不是解决方案,如服务器参数手册页中所述

...these operations would successfully insert or modify a document but the index or indexes would not include references to the document.

...这些操作将成功插入或修改文档,但一个或多个索引不包括对文档的引用。

What that sentence means is that the offending document will not be included in the index, and may be missing from query results.

这句话的意思是违规文档不会包含在索引中,并且可能会从查询结果中丢失

For example:

例如:

> db.test.insert({_id: 0, a: "abc"})

> db.test.insert({_id: 1, a: "def"})

> db.test.insert({_id: 2, a: <string more than 1024 characters long>})

> db.adminCommand( { setParameter: 1, failIndexKeyTooLong: false } )

> db.test.find()
{"_id": 0, "a": "abc"}
{"_id": 1, "a": "def"}
{"_id": 2, "a": <string more than 1024 characters long>}
Fetched 3 record(s) in 2ms

> db.test.find({a: {$ne: "abc"}})
{"_id": 1, "a": "def"}
Fetched 1 record(s) in 1ms

By forcing MongoDB to ignore the failIndexKeyTooLongerror, the last query does not contain the offending document(i.e. the document with _id: 2is missing from the result), thus the query resulted in the wrong result set.

通过强制 MongoDB 忽略failIndexKeyTooLong错误,最后一个查询不包含有问题的文档(即_id: 2结果中缺少的文档),因此查询导致错误的结果集。

回答by JoelABair

When running into the "index key limit", the solution depends on the needs of your schema. In extremely rare cases, key matching on an value of > 1024 bytes is a design requirement. In fact, nearly all Databases impose an index key limit restriction, yet typically somewhat configurable in legacy relational DBs (Oracle/MySQL/PostgreSQL), so that you can easily shoot yourself in the foot.

当遇到“索引键限制”时,解决方案取决于您的架构的需求。在极少数情况下,大于 1024 字节的值的键匹配是设计要求。事实上,几乎所有数据库都施加了索引键限制限制,但通常在传统关系数据库 (Oracle/MySQL/PostgreSQL) 中有些可配置,因此您可以轻松地将自己打倒。

For quick search, a "text" index is designed to optimize searching and pattern matching on long text fields, and is well suited to the use case. However, more commonly, a uniqueness constraint on long text valuesis a requirement. And "text" indexes do not behave as does a unique scalar value with the unique flag set { unique: true }(more like an array of all the text strings in the field).

对于快速搜索,“文本”索引旨在优化对长文本字段的搜索和模式匹配,非常适合用例。但是,更常见的是,要求对长文本值进行唯一性约束。并且“文本”索引的行为不像设置了唯一标志的唯一标量值 { unique: true }(更像是字段中所有文本字符串的数组)。

Taking inspiration from MongoDb's GridFS, uniqueness checks can easily be implemented by adding a "md5" field to the document and creating a unique scalar index on that. Sort of like a custom unique hashed index. This allows a virtually unlimited (~ 16mb) text field length, that is indexed for search and unique across the collection.

从 MongoDb 的 GridFS 中汲取灵感,可以通过向文档添加“md5”字段并在其上创建唯一标量索引来轻松实现唯一性检查。有点像自定义的唯一散列索引。这允许几乎无限(~ 16mb)的文本字段长度,它被索引以用于搜索并且在整个集合中是唯一的。

const md5 = require('md5');
const mongoose = require('mongoose');

let Schema = new mongoose.Schema({
  text: {
    type: String,
    required: true,
    trim: true,
    set: function(v) {
        this.md5 = md5(v);
        return v;
    }
  },
  md5: {
    type: String,
    required: true,
    trim: true
  }
});

Schema.index({ md5: 1 }, { unique: true });
Schema.index({ text: "text" }, { background: true });

回答by MalcolmOcean

In my case I was trying to index on a large subdocument array, and when I went and looked at my query the query was actually for a subproperty of a subproperty, so I changed the index to focus on said subsubproperty and it worked okay.

在我的例子中,我试图在一个大的子文档数组上建立索引,当我去查看我的查询时,查询实际上是针对子属性的子属性,所以我改变了索引以专注于所述子子属性并且它工作正常。

In my case, goalswas the large subdocument array, the failing "key too large" index looked like {"goals": 1, "emailsDisabled": 1, "priorityEmailsDisabled": 1}and the query looked like this:

在我的例子中,goals是大型子文档数组,失败的“键太大”索引看起来像这样{"goals": 1, "emailsDisabled": 1, "priorityEmailsDisabled": 1},查询看起来像这样:

emailsDisabled: {$ne: true},
priorityEmailsDisabled: {$ne: true},
goals: {
  $elemMatch: {
    "topPriority.ymd": ymd,
  }
}

and once I changed the index to be {"goals.topPriority.ymd": 1, "emailsDisabled": 1, "priorityEmailsDisabled": 1}it worked fine.

一旦我将索引更改为{"goals.topPriority.ymd": 1, "emailsDisabled": 1, "priorityEmailsDisabled": 1}它工作正常。

To be clear, all that I'm certain has worked here is that it allowed me to create the index. The question of whether that index works for that query is a separate one that I have not yet answered.

需要明确的是,我确定在这里工作的全部内容是它允许我创建索引。该索引是否适用于该查询是一个我尚未回答的单独问题。