MongoDB 全文和部分文本搜索
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44833817/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MongoDB Full and Partial Text Search
提问by Leonel
Env:
环境:
- MongoDB (3.2.0) with mongos
- 带有 mongos 的 MongoDB (3.2.0)
Collection:
收藏:
- users
- 用户
Text Index creation:
文本索引创建:
BasicDBObject keys = new BasicDBObject();
keys.put("name","text");
BasicDBObject options = new BasicDBObject();
options.put("name", "userTextSearch");
options.put("unique", Boolean.FALSE);
options.put("background", Boolean.TRUE);
userCollection.createIndex(keys, options); // using MongoTemplate
Document:
文档:
- {"name":"LEONEL"}
- {"name":"LEONEL"}
Queries:
查询:
db.users.find( { "$text" : { "$search" : "LEONEL" } } )
=> FOUNDdb.users.find( { "$text" : { "$search" : "leonel" } } )
=> FOUND (search caseSensitive is false)db.users.find( { "$text" : { "$search" : "LEONéL" } } )
=> FOUND (search with diacriticSensitive is false)db.users.find( { "$text" : { "$search" : "LEONE" } } )
=> FOUND (Partial search)db.users.find( { "$text" : { "$search" : "LEO" } } )
=> NOT FOUND (Partial search)db.users.find( { "$text" : { "$search" : "L" } } )
=> NOT FOUND (Partial search)
db.users.find( { "$text" : { "$search" : "LEONEL" } } )
=> 找到db.users.find( { "$text" : { "$search" : "leonel" } } )
=> FOUND(搜索 caseSensitive 为假)db.users.find( { "$text" : { "$search" : "LEONéL" } } )
=> FOUND(使用变音符号敏感搜索是错误的)db.users.find( { "$text" : { "$search" : "LEONE" } } )
=> 找到(部分搜索)db.users.find( { "$text" : { "$search" : "LEO" } } )
=> 未找到(部分搜索)db.users.find( { "$text" : { "$search" : "L" } } )
=> 未找到(部分搜索)
Any idea why I get 0 results using as query "LEO" or "L"?
知道为什么我使用查询“LEO”或“L”得到 0 个结果吗?
Regex with Text Index Search is not allowed.
不允许使用带有文本索引搜索的正则表达式。
db.getCollection('users')
.find( { "$text" : { "$search" : "/LEO/i",
"$caseSensitive": false,
"$diacriticSensitive": false }} )
.count() // 0 results
db.getCollection('users')
.find( { "$text" : { "$search" : "LEO",
"$caseSensitive": false,
"$diacriticSensitive": false }} )
.count() // 0 results
MongoDB Documentation:
MongoDB 文档:
回答by Stennie
As at MongoDB 3.4, the text searchfeature is designed to support case-insensitive searches on text content with language-specific rules for stopwords and stemming. Stemming rules for supported languagesare based on standard algorithms which generally handle common verbs and nouns but are unaware of proper nouns.
在 MongoDB 3.4 中,文本搜索功能旨在支持使用特定语言的停用词和词干规则对文本内容进行不区分大小写的搜索。支持语言的词干规则基于标准算法,这些算法通常处理常见的动词和名词,但不知道专有名词。
There is no explicit support for partial or fuzzy matches, but terms that stem to a similar result may appear to be working as such. For example: "taste", "tastes", and tasteful" all stem to "tast". Try the Snowball Stemming Demopage to experiment with more words and stemming algorithms.
没有对部分或模糊匹配的明确支持,但源于类似结果的术语似乎是这样工作的。例如:“taste”、“tastes”和tasteful 都是“tast”的词干。试试Snowball Stemming Demo页面来试验更多的词和词干算法。
Your results that match are all variations on the same word "LEONEL", and vary only by case and diacritic. Unless "LEONEL" can be stemmed to something shorter by the rules of your selected language, these are the only type of variations that will match.
匹配的结果都是同一个词“LEONEL”的变体,仅因大小写和变音符号而异。除非“LEONEL”可以根据您选择的语言规则缩短为更短的内容,否则这些是唯一匹配的变体类型。
If you want to do efficient partial matches you'll need to take a different approach. For some helpful ideas see:
如果您想进行有效的部分匹配,则需要采用不同的方法。有关一些有用的想法,请参阅:
- Efficient Techniques for Fuzzy and Partial matching in MongoDBby John Page
- Efficient Partial Keyword Searchesby James Tan
- John Page MongoDB 中模糊和部分匹配的有效技术
- James Tan 的高效部分关键字搜索
There is a relevant improvement request you can watch/upvote in the MongoDB issue tracker: SERVER-15090: Improve Text Indexes to support partial word match.
您可以在 MongoDB 问题跟踪器中查看/支持相关的改进请求:SERVER-15090:改进文本索引以支持部分单词匹配。
回答by Ricardo Canelas
As Mongo currently does not supports partial search by default...
由于 Mongo 目前默认不支持部分搜索......
I created a simple static method.
我创建了一个简单的静态方法。
import mongoose from 'mongoose'
const PostSchema = new mongoose.Schema({
title: { type: String, default: '', trim: true },
body: { type: String, default: '', trim: true },
});
PostSchema.index({ title: "text", body: "text",},
{ weights: { title: 5, body: 3, } })
PostSchema.statics = {
searchPartial: function(q, callback) {
return this.find({
$or: [
{ "title": new RegExp(q, "gi") },
{ "body": new RegExp(q, "gi") },
]
}, callback);
},
searchFull: function (q, callback) {
return this.find({
$text: { $search: q, $caseSensitive: false }
}, callback)
},
search: function(q, callback) {
this.searchFull(q, (err, data) => {
if (err) return callback(err, data);
if (!err && data.length) return callback(err, data);
if (!err && data.length === 0) return this.searchPartial(q, callback);
});
},
}
export default mongoose.models.Post || mongoose.model('Post', PostSchema)
How to use:
如何使用:
import Post from '../models/post'
Post.search('Firs', function(err, data) {
console.log(data);
})
回答by nurealam siddiq
Without creating index, we could simply use:
无需创建索引,我们可以简单地使用:
db.users.find({ name: /<full_or_partial_text>/i})
(case insensitive)
db.users.find({ name: /<full_or_partial_text>/i})
(不区分大小写)
回答by flash
I wrapped @Ricardo Canelas' answer in a mongoose plugin here on npm
我将@Ricardo Canelas 的答案封装在 npm 上的mongoose 插件中
Two changes made:
- Uses promises
- Search on any field with type String
进行了两项更改: - 使用承诺 - 搜索具有类型的任何字段 String
Here's the important source code:
这是重要的源代码:
// mongoose-partial-full-search
module.exports = exports = function addPartialFullSearch(schema, options) {
schema.statics = {
...schema.statics,
makePartialSearchQueries: function (q) {
if (!q) return {};
const $or = Object.entries(this.schema.paths).reduce((queries, [path, val]) => {
val.instance == "String" &&
queries.push({
[path]: new RegExp(q, "gi")
});
return queries;
}, []);
return { $or }
},
searchPartial: function (q, opts) {
return this.find(this.makePartialSearchQueries(q), opts);
},
searchFull: function (q, opts) {
return this.find({
$text: {
$search: q
}
}, opts);
},
search: function (q, opts) {
return this.searchFull(q, opts).then(data => {
return data.length ? data : this.searchPartial(q, opts);
});
}
}
}
exports.version = require('../package').version;
Usage
用法
// PostSchema.js
import addPartialFullSearch from 'mongoose-partial-full-search';
PostSchema.plugin(addPartialFullSearch);
// some other file.js
import Post from '../wherever/models/post'
Post.search('Firs').then(data => console.log(data);)
回答by vigviswa
If you are using a variable to store the string or value to be searched:
如果您使用变量来存储要搜索的字符串或值:
It will work with the Regex, as:
它将与 Regex 一起使用,如下所示:
{ collection.find({ name of Mongodb field: new RegExp(variable_name, 'i') }
Here, the I is for the ignore-case option
在这里, I 用于忽略大小写选项
回答by Hrishikesh
import re
db.collection.find({"$or": [{"your field name": re.compile(text, re.IGNORECASE)},{"your field name": re.compile(text, re.IGNORECASE)}]})