MongoDB 全文和部分文本搜索

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44833817/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 20:57:33  来源:igfitidea点击:

MongoDB Full and Partial Text Search

mongodbmongodb-queryaggregation-frameworkspring-data-mongodbfull-text-indexing

提问by Leonel

Env:

环境:

  • MongoDB (3.2.0) with mongos
  • 带有 mongos 的 MongoDB (3.2.0)


Collection:

收藏:

  • users
  • 用户


Text Index creation:

文本索引创建:

  BasicDBObject keys = new BasicDBObject();
  keys.put("name","text");

  BasicDBObject options = new BasicDBObject();
  options.put("name", "userTextSearch");
  options.put("unique", Boolean.FALSE);
  options.put("background", Boolean.TRUE);

  userCollection.createIndex(keys, options); // using MongoTemplate


Document:

文档:

  • {"name":"LEONEL"}
  • {"name":"LEONEL"}


Queries:

查询:

  • db.users.find( { "$text" : { "$search" : "LEONEL" } } )=> FOUND
  • db.users.find( { "$text" : { "$search" : "leonel" } } )=> FOUND (search caseSensitive is false)
  • db.users.find( { "$text" : { "$search" : "LEONéL" } } )=> FOUND (search with diacriticSensitive is false)
  • db.users.find( { "$text" : { "$search" : "LEONE" } } )=> FOUND (Partial search)
  • db.users.find( { "$text" : { "$search" : "LEO" } } )=> NOT FOUND (Partial search)
  • db.users.find( { "$text" : { "$search" : "L" } } )=> NOT FOUND (Partial search)
  • db.users.find( { "$text" : { "$search" : "LEONEL" } } )=> 找到
  • db.users.find( { "$text" : { "$search" : "leonel" } } )=> FOUND(搜索 caseSensitive 为假)
  • db.users.find( { "$text" : { "$search" : "LEONéL" } } )=> FOUND(使用变音符号敏感搜索是错误的)
  • db.users.find( { "$text" : { "$search" : "LEONE" } } )=> 找到(部分搜索)
  • db.users.find( { "$text" : { "$search" : "LEO" } } )=> 未找到(部分搜索)
  • db.users.find( { "$text" : { "$search" : "L" } } )=> 未找到(部分搜索)

Any idea why I get 0 results using as query "LEO" or "L"?

知道为什么我使用查询“LEO”或“L”得到 0 个结果吗?

Regex with Text Index Search is not allowed.

不允许使用带有文本索引搜索的正则表达式。

db.getCollection('users')
     .find( { "$text" : { "$search" : "/LEO/i", 
                          "$caseSensitive": false, 
                          "$diacriticSensitive": false }} )
     .count() // 0 results

db.getCollection('users')
     .find( { "$text" : { "$search" : "LEO", 
                          "$caseSensitive": false, 
                          "$diacriticSensitive": false }} )
.count() // 0 results


MongoDB Documentation:

MongoDB 文档:

回答by Stennie

As at MongoDB 3.4, the text searchfeature is designed to support case-insensitive searches on text content with language-specific rules for stopwords and stemming. Stemming rules for supported languagesare based on standard algorithms which generally handle common verbs and nouns but are unaware of proper nouns.

在 MongoDB 3.4 中,文本搜索功能旨在支持使用特定语言的停用词和词干规则对文本内容进行不区分大小写的搜索。支持语言的词干规则基于标准算法,这些算法通常处理常见的动词和名词,但不知道专有名词。

There is no explicit support for partial or fuzzy matches, but terms that stem to a similar result may appear to be working as such. For example: "taste", "tastes", and tasteful" all stem to "tast". Try the Snowball Stemming Demopage to experiment with more words and stemming algorithms.

没有对部分或模糊匹配的明确支持,但源于类似结果的术语似乎是这样工作的。例如:“taste”、“tastes”和tasteful 都是“tast”的词干。试试Snowball Stemming Demo页面来试验更多的词和词干算法。

Your results that match are all variations on the same word "LEONEL", and vary only by case and diacritic. Unless "LEONEL" can be stemmed to something shorter by the rules of your selected language, these are the only type of variations that will match.

匹配的结果都是同一个词“LEONEL”的变体,仅因大小写和变音符号而异。除非“LEONEL”可以根据您选择的语言规则缩短为更短的内容,否则这些是唯一匹配的变体类型。

If you want to do efficient partial matches you'll need to take a different approach. For some helpful ideas see:

如果您想进行有效的部分匹配,则需要采用不同的方法。有关一些有用的想法,请参阅:

There is a relevant improvement request you can watch/upvote in the MongoDB issue tracker: SERVER-15090: Improve Text Indexes to support partial word match.

您可以在 MongoDB 问题跟踪器中查看/支持相关的改进请求:SERVER-15090:改进文本索引以支持部分单词匹配

回答by Ricardo Canelas

As Mongo currently does not supports partial search by default...

由于 Mongo 目前默认不支持部分搜索......

I created a simple static method.

我创建了一个简单的静态方法。

import mongoose from 'mongoose'

const PostSchema = new mongoose.Schema({
    title: { type: String, default: '', trim: true },
    body: { type: String, default: '', trim: true },
});

PostSchema.index({ title: "text", body: "text",},
    { weights: { title: 5, body: 3, } })

PostSchema.statics = {
    searchPartial: function(q, callback) {
        return this.find({
            $or: [
                { "title": new RegExp(q, "gi") },
                { "body": new RegExp(q, "gi") },
            ]
        }, callback);
    },

    searchFull: function (q, callback) {
        return this.find({
            $text: { $search: q, $caseSensitive: false }
        }, callback)
    },

    search: function(q, callback) {
        this.searchFull(q, (err, data) => {
            if (err) return callback(err, data);
            if (!err && data.length) return callback(err, data);
            if (!err && data.length === 0) return this.searchPartial(q, callback);
        });
    },
}

export default mongoose.models.Post || mongoose.model('Post', PostSchema)

How to use:

如何使用:

import Post from '../models/post'

Post.search('Firs', function(err, data) {
   console.log(data);
})

回答by nurealam siddiq

Without creating index, we could simply use:

无需创建索引,我们可以简单地使用:

db.users.find({ name: /<full_or_partial_text>/i})(case insensitive)

db.users.find({ name: /<full_or_partial_text>/i})(不区分大小写)

回答by flash

I wrapped @Ricardo Canelas' answer in a mongoose plugin here on npm

我将@Ricardo Canelas 的答案封装在 npm 上的mongoose 插件中

Two changes made: - Uses promises - Search on any field with type String

进行了两项更改: - 使用承诺 - 搜索具有类型的任何字段 String

Here's the important source code:

这是重要的源代码:

// mongoose-partial-full-search

module.exports = exports = function addPartialFullSearch(schema, options) {
  schema.statics = {
    ...schema.statics,
    makePartialSearchQueries: function (q) {
      if (!q) return {};
      const $or = Object.entries(this.schema.paths).reduce((queries, [path, val]) => {
        val.instance == "String" &&
          queries.push({
            [path]: new RegExp(q, "gi")
          });
        return queries;
      }, []);
      return { $or }
    },
    searchPartial: function (q, opts) {
      return this.find(this.makePartialSearchQueries(q), opts);
    },

    searchFull: function (q, opts) {
      return this.find({
        $text: {
          $search: q
        }
      }, opts);
    },

    search: function (q, opts) {
      return this.searchFull(q, opts).then(data => {
        return data.length ? data : this.searchPartial(q, opts);
      });
    }
  }
}

exports.version = require('../package').version;

Usage

用法

// PostSchema.js
import addPartialFullSearch from 'mongoose-partial-full-search';
PostSchema.plugin(addPartialFullSearch);

// some other file.js
import Post from '../wherever/models/post'

Post.search('Firs').then(data => console.log(data);)

回答by vigviswa

If you are using a variable to store the string or value to be searched:

如果您使用变量来存储要搜索的字符串或值:

It will work with the Regex, as:

它将与 Regex 一起使用,如下所示:

{ collection.find({ name of Mongodb field: new RegExp(variable_name, 'i') }

Here, the I is for the ignore-case option

在这里, I 用于忽略大小写选项

回答by Hrishikesh

import re

db.collection.find({"$or": [{"your field name": re.compile(text, re.IGNORECASE)},{"your field name": re.compile(text, re.IGNORECASE)}]})