database NoSQL 数据库中的全文搜索
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5453872/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Full-text search in NoSQL databases
提问by unj2
- Has anyone here have any experience deploying a real online system that had a full text search in any of the NoSQL databases?
- For example, how does the full-text search compare in MongoDB, Riak and CouchDB?
- Some of the metric that I am looking for is ease of deployment and maintaince and of course speed.
- How mature are they? Are they any replacement for the Lucene infrastructure?
- 这里有没有人有部署真正的在线系统的经验,该系统在任何 NoSQL 数据库中进行全文搜索?
- 例如,全文搜索在 MongoDB、Riak 和 CouchDB 中的比较如何?
- 我正在寻找的一些指标是易于部署和维护,当然还有速度。
- 他们有多成熟?它们是 Lucene 基础设施的替代品吗?
Thanks.
谢谢。
回答by Andreas Jung
None of the existing "NoSQL" database provides a reasonable implementation of something that could be named "fulltext search". MongoDB in particular has barely nothing so far (matching using regular expressions is not fulltext search and searching using $in or $all operators on a keyword word list is just a very poor implementation of a "fulltext search"). Using Solr, ElasticSearch or Sphinx is straight forward - an implementation and integration on the application level. Your choice widely depends on you requirements and current setup.
现有的“NoSQL”数据库都没有提供可以命名为“全文搜索”的东西的合理实现。特别是 MongoDB 到目前为止几乎没有任何东西(使用正则表达式进行匹配不是全文搜索,在关键字词列表上使用 $in 或 $all 运算符进行搜索只是“全文搜索”的一个非常糟糕的实现)。使用 Solr、ElasticSearch 或 Sphinx 是直接的 - 应用程序级别的实现和集成。您的选择很大程度上取决于您的要求和当前设置。
回答by Eva611
Here's the details on Riak Search http://wiki.basho.com/Riak-Search.htmland a presentationon it as well
以下是 Riak Search http://wiki.basho.com/Riak-Search.html的详细信息 以及关于它的演示文稿
回答by JasonSmith
Yes. See CouchDB-Lucenewhich is a CouchDB extension to support full Lucene queries of the data.
是的。请参阅CouchDB-Lucene,它是一个 CouchDB 扩展,用于支持数据的完整 Lucene 查询。
回答by Tom Kerr
MarkLogic has better options for text search, if I recall. Here is a discussion on the topic, though it is on their blog, from their writers.
如果我记得的话,MarkLogic 有更好的文本搜索选项。这里有一个关于这个话题的讨论,虽然是在他们的博客上,来自他们的作家。
回答by Irfan
I'm involved in the development of an application using Solandra(Cassandra based Apache Solr). In my experience the system is quite stable and able to handle TB+ data. I'm personally quite happy with the software for the following reasons: 1. Automated partitioning of data due to Cassandra backend. 2. Rich querying capabilities (due to Solr and Lucene). 3. Fast read and writes (writes significantly faster than reads).
我参与了使用Solandra(基于 Cassandra 的 Apache Solr)的应用程序的开发。根据我的经验,该系统非常稳定,能够处理 TB+ 数据。我个人对该软件非常满意,原因如下: 1. 由于 Cassandra 后端,数据自动分区。2. 丰富的查询能力(得益于 Solr 和 Lucene)。3. 快速读写(写入速度明显快于读取速度)。
However currently Solandra, I believe does not support batch mutations. That is, I can insert 100 columns in a single insertion into Cassandra, however Solandra does not support this.
但是目前Solandra,我相信不支持批量突变。也就是说,我可以在一次插入中插入 100 列到 Cassandra 中,但是 Solandra 不支持这一点。
回答by Chris Fulstow
For MongoDB, there isn't a full full-text indexing feature yet, however there's possibly one in the pipeline, perhaps due in v2.2.
对于 MongoDB,目前还没有完整的全文索引功能,但可能有一个在管道中,可能在 v2.2 中到期。
In the meantime, you can create a simple inverted index by using a string array field, and putting an index on it, as described here: Full Text Search in Mongo
同时,您可以通过使用字符串数组字段并在其上放置索引来创建一个简单的倒排索引,如下所述:Mongo 中的全文搜索
Or, you could maintain a parallel full-text index in a dedicated Solr or Lucene index, and if you're feeling really ambitious replicate directly to your full-text store from the Mongo oplog. Otherwise, populate both and keep in sync from your application logic.
或者,您可以在专用的 Solr 或 Lucene 索引中维护一个并行全文索引,如果您真的很想从 Mongo oplog 直接复制到您的全文存储。否则,填充两者并与您的应用程序逻辑保持同步。
回答by Petrogad
I've just finished completion of this using data that is stored in MongoDBwhile having my Fulltext engin in Sphinx Search. I know mongo has a votable issue for adding fulltext to a future release; however at this point they don't have it.
我刚刚使用存储在MongoDB 中的数据完成了这项工作,同时在Sphinx Search 中使用了我的 Fulltext引擎。我知道 mongo 有一个可投票的问题,可以将全文添加到未来版本中;然而在这一点上,他们没有它。
There are several ways of inserting your Mongo information into sphinx; however the one I've found the most luck with (and has been extremely easy) is through xmlpipe2. It took me a bit to fully understand how to use this; however this article: Sphinx xmlpipe2 in PHPhas an outstanding walk through which shows (at least in PHP) how to build the document, then how to insert it into sphinx.
有几种方法可以将您的 Mongo 信息插入到 sphinx 中;然而,我发现最幸运的(并且非常容易)是通过xmlpipe2。我花了一点时间才完全理解如何使用它;但是这篇文章:Sphinx xmlpipe2 in PHP有一个出色的演练,它展示了(至少在 PHP 中)如何构建文档,然后如何将它插入到 sphinx 中。
Essentially my config ends up looking like this:
基本上我的配置最终看起来像这样:
source my_source {
type = xmlpipe
xmlpipe_command = /usr/bin/php /www/generateSphinXml.php identifierForMyTable
}
with my index then looking like this:
我的索引看起来像这样:
index my_index {
source = my_source
path = /usr/local/sphinx/var/data/my_index
docinfo = extern
min_word_len = 1
mlock = 0
morphology = stem_en
charset_type = utf-8 //<----- This is q requirement however.
enable_star = 1
html_strip = 0
min_prefix_len = 2
}
I've had excellent success with this; hopefully you can find this as useful.
我在这方面取得了巨大的成功;希望你能发现这很有用。
回答by Sougata Pal
If you are using PHP there is a great solution for fulltext search in No-SQL database MongoDB named as MongoLantern. http://sourceforge.net/projects/mongolantern/
如果您正在使用 PHP,那么在名为MongoLantern 的No-SQL 数据库 MongoDB 中进行全文搜索有一个很好的解决方案。http://sourceforge.net/projects/mongolantern/
Previously I was using Sphinx+MongoDB to perform fulltext search, the performance was great but result quality was very poor. With MongoLantern my current search improved a lot.
之前我使用Sphinx+MongoDB进行全文搜索,性能不错,但结果质量很差。使用 MongoLantern,我当前的搜索改进了很多。
MongoLantern is also listed in MongoDB site.
MongoLantern 也在 MongoDB 站点中列出。
Please let me know if you try it of your own.
如果您自己尝试,请告诉我。
回答by Andriy Tkach
Solr could be used with 10gen's Mongo Connector, which allows to push data there (among others)
Solr 可以与 10gen 的 Mongo Connector 一起使用,它允许将数据推送到那里(等等)
https://github.com/10gen-labs/mongo-connector/tree/master/mongo-connector
https://github.com/10gen-labs/mongo-connector/tree/master/mongo-connector
From their example:
从他们的例子:
python mongo_connector.py -m localhost:27217 -t http://localhost:8080/solr
回答by OSP
Definitely Solr. It is NoSQL.
绝对是索尔。它是 NoSQL。
It has:
它有:
- awesome performance
- awesome storage options
- stemmers
- highligting
- faceting
- distributed search (SolrCloud)
- perfect API
- web admin
- HTML, PDF, DOC indexing
- many other features
- 出色的表现
- 很棒的存储选项
- 词干分析器
- 高亮
- 刻面
- 分布式搜索(SolrCloud)
- 完善的API
- 网络管理员
- HTML、PDF、DOC 索引
- 许多其他功能

