MongoDB + Elasticsearch 还是只有 Elasticsearch?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29538527/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MongoDB + Elasticsearch or only Elasticsearch?
提问by user1853777
We have a new project there for index a large amount of data and for provide real time. I have also complexe search with facets, full text, geospatial...
我们在那里有一个新项目,用于索引大量数据并提供实时性。我还对方面、全文、地理空间进行了复杂的搜索……
The first prototype is to index in MongoDB and next, into Elasticsearch, because I had read that Elasticsearch does not apply a checksum on stored files and the index can't be fully trusted. But since last versions (in the version 1.5), there is now a checksum and I'm guessing if we can use Elasticsearch as primary data store ? And what is the benefit to use MongoDB in addition to Elasticsearch ?
第一个原型是在 MongoDB 中建立索引,然后在 Elasticsearch 中建立索引,因为我读到 Elasticsearch 不对存储的文件应用校验和,并且索引不能完全信任。但是自从上一个版本(在 1.5 版中),现在有一个校验和,我猜我们是否可以使用 Elasticsearch 作为主要数据存储?除了 Elasticsearch 之外,使用 MongoDB 有什么好处?
I can't find up to date answer about thoses features in Elasticsearch
我在 Elasticsearch 中找不到有关这些功能的最新答案
Thanks a lot
非常感谢
回答by Slam
Talking about arguments to useMongo instead of/together with ES:
谈论使用Mongo 而不是/与 ES 一起使用的参数:
User/role management.
- Built-in in MongoDB. May not fit all your needs, may be clumsy somewhere, but it exists and it was implemented pretty long time ago.
- The only thing for security in ES is
shield
. But it ships only for Gold/Platinum subscription for production use.
Schema
- ES is schemaless, but its built on top of
Lucene
and written inJava
. The core idea of this tool - index and search documents, and working this way requires index consistency. At back end, all documents should be fitted in flatlucene
index, which requires some understanding about how ES should deal with your nested documents and values, and how you should organize your indexes to maintain balance between speed and data completeness/consistency. Working with ES requires you to keep some things about schema in mind constantly. I.e: as you can index almost anything to ES without putting corresponding mapping in advance, ES can "guess" mapping on the fly but sometimes do it wrong and sometimes implicit mapping is evil, because once it put, it can't be changed w/o reindexing whole index. So, its better to not treat ES as schemaless store, because you can step on a rake some time (and this will be pain:) ), but rather treat it as schema-intensive, at least when you work with documents, that can be sliced to concrete fields. - Mongo, on the other hand, can "chew and leave no crumbs" out of almost anything you put in it. And most your queries will work fine, `til you remember how Mongo will deal with your data from JavaScript perspective. And as JS is weakly typed, you can work with really schemaless workflow (for sure, if you need such)
- ES is schemaless, but its built on top of
Handling non-table-like data.
- ES is limited to handle data without putting it to search index. And this solution is good enough, when you need to store and retrieve some extra data (comparing to data you want to search against).
- MongoDB supports
gridFS
. This gives you ability to handle large chunks of data behind the same interface. I.e., you can store binary data in Mongo and retrieve it within the same interface, from your code perspective.
用户/角色管理。
- 内置于 MongoDB。可能无法满足您的所有需求,在某处可能很笨拙,但它存在并且很久以前就已实施。
- ES 中唯一的安全性是
shield
. 但它仅用于生产使用的 Gold/Platinum 订阅。
架构
- ES 是无模式的,但它建立
Lucene
在Java
. 这个工具的核心思想 - 索引和搜索文档,以这种方式工作需要索引一致性。在后端,所有文件应平放lucene
索引,这需要了解 ES 应该如何处理嵌套文档和值,以及应该如何组织索引以保持速度和数据完整性/一致性之间的平衡。使用 ES 需要您时刻牢记关于模式的一些事情。即:因为您可以在不预先放置相应映射的情况下将几乎任何内容索引到 ES,所以 ES 可以动态“猜测”映射,但有时会做错,有时隐式映射是邪恶的,因为一旦放置,就无法更改 w /o 重新索引整个索引。因此,最好不要将 ES 视为无模式存储,因为您可能会花一些时间(这会很痛苦:)),而是将其视为模式密集型存储,至少在处理文档时,可以被切成具体的领域。 - 另一方面,Mongo 几乎可以“咀嚼并且不会留下任何碎屑”。并且您的大多数查询都可以正常工作,直到您记得 Mongo 将如何从 JavaScript 的角度处理您的数据。由于 JS 是弱类型的,你可以使用真正的无模式工作流(当然,如果你需要这样的话)
- ES 是无模式的,但它建立
处理非表格数据。
- ES 仅限于处理数据而不将其放入搜索索引。当您需要存储和检索一些额外数据(与您要搜索的数据相比)时,此解决方案已经足够好了。
- MongoDB 支持
gridFS
. 这使您能够在同一接口后面处理大量数据。即,您可以在 Mongo 中存储二进制数据,并从您的代码角度在同一个界面中检索它。
回答by Amit Kr
Well, choose the right tool for the right job. If you require searching capabilities such as full text search, faceting etc, then nothing can beat a full fledged search engine. ElasticSearch(ES) or Solr is just a matter of choice.
好吧,为正确的工作选择正确的工具。如果您需要全文搜索、分面等搜索功能,那么没有什么比一个成熟的搜索引擎更胜一筹了。ElasticSearch(ES) 或 Solr 只是一个选择问题。
You can actually feed(index) documents into ES for searching and then fetch the complete details for a particular entry from MongoDB or any other database.
您实际上可以将(索引)文档输入 ES 进行搜索,然后从 MongoDB 或任何其他数据库中获取特定条目的完整详细信息。
I can make your task easier, do take a look at my open source work that's using MongoDB, ES, Redis and RabbitMQ, all integrated at one place, here on github
我可以让你的任务更轻松,看看我使用 MongoDB、ES、Redis 和 RabbitMQ 的开源工作,所有这些都集成在一个地方,在 github 上
Please note that the application is built in .Net C#.
请注意,该应用程序是用 .Net C# 构建的。
回答by Alex
After having used Elasticsearch on production, I can add up to this thread few notes :
在生产中使用 Elasticsearch 后,我可以在此线程中添加一些注释:
- We securized our Elasticsearch clustering via a reverse proxy which check client certificate authenticity at request time before letting the query in : it proves that there is multiple way to add authentication anyway. (If you need more accuracy in security, like by using roles, there is few plugins that can be added to manage permissions)
- Elasticsearch mapping and settings (tuning) are really important concepts to fully understand before going on production with it, and that's no that easy to get how everything works quickly.
- Clustering and horizontal scaling is very flexible and easy to set up
- The suite tools (Kibana, beats, etc ..) are a very convinient way to gather logs, expose key data, etc ...
- Search features are extremely advanced, you can really do amazing things when you master a bit how full text search works (fuzzyness, boosting, scoring, stemming, tokenizer, analyzers, and so on ...).
- API's are a bit scattered and there is not unique ways to achieve something. And some API are really WTF to use, like the bulk insert API: you need to pass binary data, with JSON format (ofc don't forget end of line characters) and repeating some fields multiple times. This is very verbose and I guess it's legacy code like we all have in our projects ;).
- Last thing : if you develop a Java project, do not use Hibernate Search to duplicate data from a datasource to your ES cluster, we had so much issues with Hibernate Search, if we had to do that again, we'd do that manually.
- 我们通过反向代理保护了我们的 Elasticsearch 集群,该代理在请求时检查客户端证书的真实性,然后再进行查询:它证明无论如何都有多种添加身份验证的方法。(如果您需要更准确的安全性,例如使用角色,可以添加很少的插件来管理权限)
- Elasticsearch 映射和设置(调整)是在使用它进行生产之前完全理解的非常重要的概念,而且要快速了解一切如何工作并不容易。
- 集群和水平扩展非常灵活且易于设置
- 套件工具(Kibana、beats 等)是收集日志、公开关键数据等的非常方便的方式......
- 搜索功能非常先进,当您掌握全文搜索的工作原理(模糊性、提升、评分、词干提取、分词器、分析器等)时,您真的可以做出惊人的事情。
- API 有点分散,并且没有独特的方法来实现某些目标。并且一些 API 确实是 WTF 使用,例如批量插入 API:您需要传递二进制数据,使用 JSON 格式(ofc 不要忘记行尾字符)并多次重复某些字段。这是非常冗长的,我猜它是遗留代码,就像我们在我们的项目中都有的一样;)。
- 最后一件事:如果你开发一个 Java 项目,不要使用 Hibernate Search 将数据从数据源复制到你的 ES 集群,我们在 Hibernate Search 方面有很多问题,如果我们必须再次这样做,我们会手动完成。
Now about the real question :
现在关于真正的问题:
To my mind, using only Elasticsearch is sufficient and may reduce complexity of having a multiple NoSQL storage systems.
在我看来,仅使用 Elasticsearch 就足够了,并且可以降低拥有多个 NoSQL 存储系统的复杂性。
I think it's worthy when you are doing a duo Relational and Transactional database + NoSQL search engine, but having two system which roughly serves the same purposes is a bit overkilled
我认为当你在做一个双人关系和事务数据库 + NoSQL 搜索引擎时这是值得的,但是拥有两个大致服务于相同目的的系统有点过头了
回答by niranjan harpale
I have recently developed a feature in my company,
我最近在我的公司开发了一个功能,
we wanted to perform some searches and rank the result according to its relevance on multiple factors and conditions.
我们想要执行一些搜索并根据结果在多个因素和条件下的相关性对结果进行排名。
So in my application, we were already using MongoDB as Db,
所以在我的应用程序中,我们已经使用 MongoDB 作为数据库,
So on ElasticSearch index, I exported some of the fields from MongoDB that I want to perform search and filters on. So according to required conditions I prepared my mongo query and elasticsearch query also and performed the search. Then I filtered and sorted the result according to my need. The whole flow will was designed in such a way that, even if there is an error from ES, mongo will fetch the records. If I get the result from ES then, mongo result will depend on ES result. This is how I used mongo and ES in combination.
因此,在 ElasticSearch 索引上,我从 MongoDB 导出了一些我想要对其执行搜索和过滤的字段。因此,根据所需的条件,我还准备了 mongo 查询和 elasticsearch 查询并执行了搜索。然后我根据需要对结果进行过滤和排序。整个流程的设计方式是,即使 ES 出现错误,mongo 也会获取记录。如果我从 ES 得到结果,那么 mongo 结果将取决于 ES 结果。这就是我结合使用 mongo 和 ES 的方式。
Also, don't forget to properly handle all updates, deletes and new record insertions.
另外,不要忘记正确处理所有更新、删除和新记录插入。
And Just to Know, results for me were Really Good.
只是为了知道,结果对我来说真的很好。