NoSQL（MongoDB）与 Lucene（或 Solr）作为您的数据库

Question

提问by eduncan911

With the NoSQL movement growing based on document-based databases, I've looked at MongoDB lately. I have noticed a striking similarity with how to treat items as "Documents", just like Lucene does (and users of Solr).

随着基于文档数据库的 NoSQL 运动的发展，我最近研究了 MongoDB。我注意到与如何将项目视为“文档”的惊人相似之处，就像 Lucene（以及 Solr 的用户）一样。

So, the question: Why would you want to use NoSQL (MongoDB, Cassandra, CouchDB, etc) over Lucene (or Solr) as your "database"?

所以，问题是：为什么要使用 NoSQL（MongoDB、Cassandra、CouchDB 等）而不是 Lucene（或 Solr）作为“数据库”？

What I am (and I am sure others are) looking for in an answer is some deep-dive comparisons of them. Let's skip over relational database discussions all together, as they serve a different purpose.

我（我相信其他人）在答案中寻找的是对它们的一些深入比较。让我们一起跳过关系数据库的讨论，因为它们用于不同的目的。

Lucene gives some serious advantages, such as powerful searching and weight systems. Not to mention facets in Solr (which Solr is being integrated into Lucene soon, yay!). You can use Lucene documents to store IDs, and access the documents as such just like MongoDB. Mix it with Solr, and you now get a WebService-based, load balanced solution.

Lucene 提供了一些重要的优势，例如强大的搜索和权重系统。更不用说 Solr 中的方面（Solr 很快就会被集成到 Lucene 中，是的！）。您可以使用 Lucene 文档来存储 ID，并像访问 MongoDB 一样访问这些文档。将它与 Solr 混合使用，您现在可以获得基于 WebService 的负载平衡解决方案。

You can even throw in a comparison of out-of-proc cache providers such as Velocity or MemCached when talking about similar data storing and scalability of MongoDB.

在谈论 MongoDB 的类似数据存储和可扩展性时，您甚至可以比较 Velocity 或 MemCached 等进程外缓存提供程序。

The restrictions around MongoDB reminds me of using MemCached, but I can use Microsoft's Velocity and have more grouping and list collection power over MongoDB (I think). Can't get any faster or scalable than caching data in memory. Even Lucene has a memory provider.

MongoDB 的限制让我想起了使用 MemCached，但我可以使用 Microsoft 的 Velocity，并且比 MongoDB 拥有更多的分组和列表收集能力（我认为）。没有比在内存中缓存数据更快或可扩展的了。甚至 Lucene 也有内存提供者。

MongoDB (and others) do have some advantages, such as the ease of use of their API. New up a document, create an id, and store it. Done. Nice and easy.

MongoDB（和其他）确实有一些优势，例如其 API 的易用性。新建一个文档，创建一个 id 并存储它。完毕。好，易于。

Answer 1

采纳答案by Mikos

This is a great question, something I have pondered over quite a bit. I will summarize my lessons learned:

这是一个很好的问题，我思考了很多。我总结一下我的经验教训：

You can easily use Lucene/Solr in lieu of MongoDB for pretty much all situations, but not vice versa. Grant Ingersoll's post sums it up here.
MongoDB etc. seem to serve a purpose where there is no requirement of searching and/or faceting. It appears to be a simpler and arguably easier transition for programmers detoxing from the RDBMS world. Unless one's used to it Lucene & Solr have a steeper learning curve.
There aren't many examples of using Lucene/Solr as a datastore, but Guardian has made some headway and summarize this in an excellent slide-deck, but they too are non-committal on totally jumping on Solr bandwagon and "investigating" combining Solr with CouchDB.
Finally, I will offer our experience, unfortunately cannot reveal much about the business-case. We work on the scale of several TB of data, a near real-time application. After investigating various combinations, decided to stick with Solr. No regrets thus far (6-months & counting) and see no reason to switch to some other.

在几乎所有情况下，您都可以轻松地使用 Lucene/Solr 代替 MongoDB，但反之则不然。Grant Ingersoll 的帖子在这里进行了总结。
MongoDB 等似乎用于不需要搜索和/或分面的目的。对于摆脱 RDBMS 世界的程序员来说，这似乎是一个更简单且可以说更容易的过渡。除非习惯了 Lucene 和 Solr，否则它的学习曲线更陡峭。
使用 Lucene/Solr 作为数据存储的例子并不多，但 Guardian 已经取得了一些进展，并在一个优秀的幻灯片中总结了这一点，但他们也没有承诺完全跳上 Solr 的潮流并“调查”结合 Solr使用 CouchDB。
最后，我将提供我们的经验，遗憾的是不能透露太多关于商业案例的信息。我们在几 TB 的数据规模上工作，这是一个近乎实时的应用程序。在研究了各种组合后，决定坚持使用 Solr。到目前为止没有后悔（6 个月和计数）并且认为没有理由切换到其他。

Summary: if you do not have a search requirement, Mongo offers a simple & powerful approach. However if search is key to your offering, you are likely better off sticking to one tech (Solr/Lucene) and optimizing the heck out of it - fewer moving parts.

总结：如果你没有搜索需求，Mongo 提供了一个简单而强大的方法。但是，如果搜索是您的产品的关键，那么您最好坚持使用一种技术（Solr/Lucene）并优化它——更少的移动部件。

My 2 cents, hope that helped.

我的 2 美分，希望有所帮助。

Answer 2

回答by Peter Long

You can't partially update a document in solr. You have to re-post all of the fields in order to update a document.

您不能在 solr 中部分更新文档。您必须重新发布所有字段才能更新文档。

And performance matters. If you do not commit, your change to solr does not take effect, if you commit every time, performance suffers.

性能很重要。如果不提交，则对 solr 的更改不会生效，如果每次都提交，则性能会受到影响。

There is no transaction in solr.

solr 中没有交易。

As solr has these disadvantages, some times nosql is a better choice.

由于 solr 有这些缺点，有时 nosql 是更好的选择。

Answer 3

回答by Parvin Gasimzade

We use MongoDB and Solr together and they perform well. You can find my blog post herewhere i described how we use this technologies together. Here's an excerpt:

我们一起使用 MongoDB 和 Solr，它们表现良好。您可以在此处找到我的博客文章，其中描述了我们如何一起使用这些技术。这是摘录：

[...] However we observe that query performance of Solr decreases when index size increases. We realized that the best solution is to use both Solr and Mongo DB together. Then, we integrate Solr with MongoDB by storing contents into the MongoDB and creating index using Solr for full-text search. We only store the unique id for each document in Solr index and retrieve actual content from MongoDB after searching on Solr. Getting documents from MongoDB is faster than Solr because there is no analyzers, scoring etc. [...]

[...] 然而，我们观察到当索引大小增加时 Solr 的查询性能会下降。我们意识到最好的解决方案是同时使用 Solr 和 Mongo DB。然后，我们通过将内容存储到 MongoDB 并使用 Solr 创建索引以进行全文搜索，从而将 Solr 与 MongoDB 集成。我们只在 Solr 索引中存储每个文档的唯一 id，并在搜索 Solr 后从 MongoDB 中检索实际内容。从 MongoDB 获取文档比 Solr 更快，因为没有分析器、评分等。 [...]

Answer 4

回答by Prasith Govin

Also please note that some people have integrated Solr/Lucene into Mongo by having all indexes be stored in Solr and also monitoring oplog operations and cascading relevant updates into Solr.

另请注意，有些人通过将所有索引存储在 Solr 中并监视 oplog 操作并将相关更新级联到 Solr 中，将 Solr/Lucene 集成到 Mongo 中。

With this hybrid approach you can really have the best of both worlds with capabilities such as full text search and fast reads with a reliable datastore that can also have blazing write speed.

使用这种混合方法，您可以真正拥有两全其美的功能，例如全文搜索和快速读取以及可靠的数据存储，还可以具有极快的写入速度。

It's a bit technical to setup but there are lots of oplog tailers that can integrate into solr. Check out what rangespan did in this article.

设置起来有点技术性，但是有很多 oplog tailers 可以集成到 solr 中。查看本文中的 rangespan 做了什么。

http://denormalised.com/home/mongodb-pub-sub-using-the-replication-oplog.html

Answer 5

回答by mjalajel

From my experience with both, Mongo is great for simple, straight-forward usage. The main Mongo disadvantage we've suffered is the poor performance on unanticipated queries (you cannot created mongo indexes for all the possible filter/sort combinations, you simple can't).

根据我对两者的经验，Mongo 非常适合简单、直接的使用。我们遇到的主要 Mongo 缺点是在意外查询上的性能不佳（您无法为所有可能的过滤器/排序组合创建 mongo 索引，您不能这样做）。

And here where Lucene/Solr prevails big time, especially with the FilterQuery caching, Performance is outstanding.

在 Lucene/Solr 盛行的地方，尤其是 FilterQuery 缓存，性能非常出色。

Answer 6

回答by Aquarelle

Since no one else mentioned it, let me add that MongoDB is schema-less, whereas Solr enforces a schema. So, if the fields of your documents are likely to change, that's one reason to choose MongoDB over Solr.

由于没有其他人提到它，让我补充一点，MongoDB 是无模式的，而 Solr 强制执行模式。因此，如果您的文档字段可能会发生变化，这就是选择 MongoDB 而不是 Solr 的原因之一。

Answer 7

回答by Beth

@mauricio-scheffer mentioned Solr 4 - for those interested in that, LucidWorks is describing Solr 4 as "the NoSQL Search Server" and there's a video at http://www.lucidworks.com/webinar-solr-4-the-nosql-search-server/where they go into detail on the NoSQL(ish) features. (The -ish is for their version of schemaless actually being a dynamic schema.)

@mauricio-scheffer 提到了 Solr 4 - 对于那些对此感兴趣的人，LucidWorks 将 Solr 4 描述为“NoSQL 搜索服务器”，http://www.lucidworks.com/webinar-solr-4-the-nosql 上有一个视频-search-server/他们详细介绍了 NoSQL(ish) 功能。（ -ish 是因为他们的无模式版本实际上是一个动态模式。）

Answer 8

回答by u5391130

If you just want to store data using key-value format, Lucene is not recommended because its inverted index will waste too much disk spaces. And with the data saving in disk, its performance is much slower than NoSQL databases such as redis because redis save data in RAM. The most advantage for Lucene is it supports much of queries, so fuzzy queries can be supported.

如果只是想用key-value格式存储数据，不推荐Lucene，因为它的倒排索引会浪费太多的磁盘空间。并且数据保存在磁盘中，由于redis将数据保存在RAM中，因此其性能比redis等NoSQL数据库要慢得多。Lucene 最大的优势就是支持的查询比较多，所以可以支持模糊查询。

Answer 9

回答by Darren Weber

The third party solutions, like a mongo op-log tail are attractive. Some thoughts or questions remain about whether the solutions could be tightly integrated, assuming a development/architecture perspective. I don't expect to see a tightly integrated solution for these features for a few reasons (somewhat speculative and subject to clarification and not up to date with development efforts):

第三方解决方案，如 mongo op-log tail 很有吸引力。假设从开发/架构的角度来看，关于解决方案是否可以紧密集成，仍然存在一些想法或问题。由于一些原因（有点推测性，需要澄清，而不是最新的开发工作），我不希望看到针对这些功能的紧密集成的解决方案：

mongo is c++, lucene/solr are java
- maybe lucene could use some mongo libs
- maybe mongo could rewrite some lucene algorithms, see also:
  - http://clucene.sourceforge.net/
  - http://lucy.apache.org/
lucene supports various doc formats
- mongo is focused on JSON (BSON)
lucene uses immutable documents
- single field updates are an issue, if they are available
lucene indexes are immutable with complex merge ops
mongo queries are javascript
mongo has no text analyzers / tokenizers (AFAIK)
mongo doc sizes are limited, that might go against the grain for lucene
mongo aggregation ops may have no place in lucene
- lucene has options to store fields across docs, but that's not the same thing
- solr somehow provides aggregation/stats and SQL/graph queries

mongo 是 c++，lucene/solr 是 java
- 也许 lucene 可以使用一些 mongo 库
- 也许 mongo 可以重写一些 lucene 算法，另见：
  - http://clucene.sourceforge.net/
  - http://lucy.apache.org/
lucene 支持多种文档格式
- mongo 专注于 JSON (BSON)
lucene 使用不可变文档
- 单字段更新是一个问题，如果它们可用
lucene 索引对于复杂的合并操作是不可变的
mongo 查询是 javascript
mongo 没有文本分析器/标记器（AFAIK）
mongo 文档大小有限，这可能与 lucene 格格不入
mongo 聚合操作在 lucene 中可能没有位置
- lucene 可以选择跨文档存储字段，但这不是一回事
- solr 以某种方式提供聚合/统计和 SQL/图形查询

Answer 10

回答by Gary Russo

MongoDB Atlas will have a lucene-based search engine soon. The big announcement was made at this week's MongoDB World 2019 conference. This is a great way to encourage more usage of their high revenue MongoDB Atlas product.

MongoDB Atlas 很快就会有一个基于 lucene 的搜索引擎。在本周的 MongoDB World 2019 大会上宣布了这一重大消息。这是鼓励更多使用其高收入 MongoDB Atlas 产品的好方法。

I was hoping to see it rolled into the MongoDB Enterprise version 4.2 but there's been no news of bringing it to their on-prem product line.

我希望看到它被纳入 MongoDB Enterprise 4.2 版，但没有将它引入他们的本地产品线的消息。

More info here: https://www.mongodb.com/atlas/full-text-search

更多信息在这里：https: //www.mongodb.com/atlas/full-text-search

NoSQL（MongoDB）与 Lucene（或 Solr）作为您的数据库

提问by eduncan911

采纳答案by Mikos

回答by Peter Long

回答by Parvin Gasimzade

回答by Prasith Govin

回答by mjalajel

回答by Aquarelle

回答by Beth

回答by u5391130

回答by Darren Weber

回答by Gary Russo

相关推荐

最近更新

标签

NoSQL（MongoDB）与 Lucene（或 Solr）作为您的数据库

提问by eduncan911

采纳答案by Mikos

回答by Peter Long

回答by Parvin Gasimzade

回答by Prasith Govin

回答by mjalajel

回答by Aquarelle

回答by Beth

回答by u5391130

回答by Darren Weber

回答by Gary Russo

相关推荐

windows 如何在 Qt Creator 中安装插件？

windows 打开安装日志文件时出错。验证指定的位置是否存在且可写

windows 批处理文件通过迭代将文件复制到多台计算机？

windows 如何注销桌面版 GitHub？

相关推荐

最近更新

标签