MySQL 选择独立的全文搜索服务器:Sphinx 还是 SOLR?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1284083/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Choosing a stand-alone full-text search server: Sphinx or SOLR?
提问by knorv
I'm looking for a stand-alone full-text search server with the following properties:
我正在寻找具有以下属性的独立全文搜索服务器:
- Must operate as a stand-alone server that can serve search requests from multiple clients
- Must be able to do "bulk indexing" by indexing the result of an SQL query: say "SELECT id, text_to_index FROM documents;"
- Must be free software and must run on Linux with MySQL as the database
- Must be fast (rules out MySQL's internal full-text search)
- 必须作为独立服务器运行,可以为来自多个客户端的搜索请求提供服务
- 必须能够通过索引 SQL 查询的结果来进行“批量索引”:比如“SELECT id, text_to_index FROM documents;”
- 必须是免费软件,并且必须在 Linux 上以 MySQL 作为数据库运行
- 一定要快(排除MySQL内部全文搜索)
The alternatives I've found that have these properties are:
我发现具有这些属性的替代方法是:
- Solr (based on Lucene)
- ElasticSearch (also based on Lucene)
- Sphinx
- Solr(基于Lucene)
- ElasticSearch(同样基于 Lucene)
- 狮身人面像
My questions:
我的问题:
- How do they compare?
- Have I missed any alternatives?
- I know that each use case is different, but are there certain cases where I would definitely notwant to use a certain package?
- 他们如何比较?
- 我错过了任何选择吗?
- 我知道,每个用例是不同的,但是否有某些情况下,我肯定不希望使用某个软件包?
回答by Mauricio Scheffer
I've been using Solr successfully for almost 2 years now, and have never used Sphinx, so I'm obviously biased. However, I'll try to keep it objective by quoting the docs or other people. I'll also take patches to my answer :-)
我已经成功使用 Solr 近 2 年了,但从未使用过 Sphinx,所以我显然有偏见。但是,我会尝试通过引用文档或其他人来保持其客观性。我也会为我的答案打补丁:-)
Similarities:
相似之处:
- Both Solr and Sphinx satisfy all of your requirements. They're fast and designed to index and search large bodies of data efficiently.
- Both have a long list of high-traffic sites using them (Solr, Sphinx)
- Both offer commercial support. (Solr, Sphinx)
- Both offer client API bindings for several platforms/languages (Sphinx, Solr)
- Both can be distributed to increase speed and capacity (Sphinx, Solr)
- Solr 和 Sphinx 都能满足您的所有要求。它们速度很快,旨在高效地索引和搜索大量数据。
- 两者都有一长串使用它们的高流量站点(Solr,Sphinx)
- 两者都提供商业支持。(Solr,狮身人面像)
- 两者都为多种平台/语言(Sphinx、Solr)提供客户端 API 绑定
- 两者都可以分布式以提高速度和容量(Sphinx,Solr)
Here are some differences:
以下是一些差异:
- Solr, being an Apache project, is obviously Apache2-licensed. Sphinx is GPLv2. This means that if you ever need to embed or extend (not just "use") Sphinx in a commercial application, you'll have to buy a commercial license (rationale)
- Solr is easily embeddablein Java applications.
- Solr is built on top of Lucene, which is a proven technology over 8 years oldwith a hugeuser base(this is only a small part). Whenever Lucene gets a new feature or speedup, Solr gets it too. Many of the devs committing to Solr are also Lucene committers.
- Sphinx integrates more tightly with RDBMSs, especially MySQL.
- Solr can be integrated with Hadoop to build distributed applications
- Solr can be integrated with Nutch to quickly build a fully-fledged web search engine with crawler.
- Solr can index proprietary formats like Microsoft Word, PDF, etc. Sphinx can't.
- Solr comes with a spell-checker out of the box.
- Solr comes with facet support out of the box. Faceting in Sphinx takes more work.
- Sphinx doesn't allow partial index updates for field data.
- In Sphinx, all document ids must be unique unsigned non-zero integer numbers. Solr doesn't even require an unique key for many operations, and unique keys can be either integers or strings.
- Solr supports field collapsing(currently as an additional patch only) to avoid duplicating similar results. Sphinx doesn't seem to provide any feature like this.
- While Sphinx is designed to only retrieve document ids, in Solr you can directly get whole documents with pretty much any kind of data, making it more independent of any external data store and it saves the extra roundtrip.
- Solr, except when used embedded, runs in a Java web containersuch as Tomcat or Jetty, which require additional specific configuration and tuning(or you can use the included Jettyand just launch it with
java -jar start.jar
). Sphinx has no additional configuration.
- Solr 是一个 Apache 项目,显然是 Apache2 许可的。狮身人面像是 GPLv2。这意味着,如果您需要在商业应用程序中嵌入或扩展(不仅仅是“使用”)Sphinx,则必须购买商业许可证(基本原理)
- Solr 很容易嵌入到 Java 应用程序中。
- Solr 建立在 Lucene 之上,Lucene 是一项已经有8 年历史的成熟技术,拥有庞大的用户群(这只是一小部分)。每当 Lucene 获得新功能或加速时,Solr 也会获得它。许多致力于 Solr 的开发人员也是 Lucene 的提交者。
- Sphinx 与 RDBMS 的集成更紧密,尤其是 MySQL。
- Solr 可与 Hadoop 集成以构建分布式应用程序
- Solr 可以与 Nutch 集成,使用 crawler 快速构建成熟的网络搜索引擎。
- Solr 可以索引专有格式,如 Microsoft Word、PDF 等。斯芬克斯不能。
- Solr 带有开箱即用的拼写检查器。
- Solr 带有开箱即用的方面支持。Sphinx 中的刻面需要更多的工作。
- Sphinx 不允许字段数据的部分索引更新。
- 在 Sphinx 中,所有文档 ID 必须是唯一的无符号非零整数。Solr甚至不需要许多操作的唯一键,唯一键可以是整数或字符串。
- Solr 支持字段折叠(目前仅作为附加补丁)以避免重复类似的结果。Sphinx 似乎没有提供任何这样的功能。
- 虽然Sphinx 旨在仅检索文档 id,但在 Solr 中,您可以直接获取包含几乎任何类型数据的整个文档,使其更独立于任何外部数据存储,并节省了额外的往返。
- Solr,除非嵌入使用,否则会在Java Web 容器(例如 Tomcat 或 Jetty)中运行,这需要额外的特定配置和调整(或者您可以使用包含的 Jetty并使用启动它
java -jar start.jar
)。Sphinx 没有额外的配置。
Related questions:
相关问题:
回答by larf311
Unless you need to extend the search functionality in any proprietary way, Sphinx is your best bet.
除非您需要以任何专有方式扩展搜索功能,否则 Sphinx 是您最好的选择。
Sphinx advantages:
狮身人面像优点:
- Development and setup is faster
- Much better (and faster) aggregation. This was the killer feature for us.
- Not XML. This is what ultimately ruled out Solr for us. We had to return rather large result sets (think hundreds of results) and then aggregate them ourselves since Solr aggregation was lacking. The amount of time to serialize to and from XML just absolutely killed performance. For small results sets though, it was perfectly fine.
- Best documentation I've seen in an open source app
- 开发和设置更快
- 更好(更快)的聚合。这是我们的杀手锏。
- 不是 XML。这就是最终为我们排除了 Solr 的原因。由于缺少 Solr 聚合,我们不得不返回相当大的结果集(想想数百个结果)然后自己聚合它们。与 XML 进行序列化的时间量绝对会降低性能。不过对于小的结果集来说,这完全没问题。
- 我在开源应用程序中看到的最好的文档
Solr advantages:
Solr的优点:
- Can be extended.
- Can hit it directly from a web app, i.e., you can have autocomplete-like searches hit the Solr server directly via AJAX.
- 可以延长。
- 可以直接从 Web 应用程序访问它,即,您可以通过 AJAX 将类似自动完成的搜索直接访问 Solr 服务器。
回答by Augiwan
Note: There are many users with the same question in mind.
注意:有很多用户有同样的问题。
So, to answer to the point:
所以,回答这个问题:
Which and why?
哪个和为什么?
Use Solrif you intend to use it in your web-app(example-site search engine). It will definitely turn out to be great, thanks to its API. You will definitely need that power for a web-app.
Use Sphinxif you want to search through tons of documents/files real quick. It indexes real fast too. I would recommend not to use it in an app that involves JSON or parsing XML to get the search results. Use it for direct dB searches. It works great on MySQL.
如果您打算在您的网络应用程序(示例站点搜索引擎)中使用Solr,请使用它。多亏了它的 API,它肯定会变得很棒。您肯定会需要 Web 应用程序的这种功能。
如果您想快速搜索大量文档/文件,请使用Sphinx。它的索引速度也非常快。我建议不要在涉及 JSON 或解析 XML 以获取搜索结果的应用程序中使用它。将其用于直接 dB 搜索。它在 MySQL 上工作得很好。
Alternatives
备择方案
Although these are the giants, there are plenty more. Also, there are those that use these to power their custom frameworks. So, i would say that you really haven't missed any. Although there is one elasticsearchthat has a good user base.
虽然这些是巨人,但还有更多。此外,有些人使用这些来支持他们的自定义框架。所以,我会说你真的没有错过任何一个。尽管有一个elasticsearch拥有良好的用户群。
回答by lo_fye
I have been using Sphinx for almost a year now, and it has been amazing. I can index 1.5 million documents in about a minute on my MacBook, and even quicker on the server. I am also using Sphinx to limit searches to places within specific latitudes & longitudes, and it is very fast. Also, how results are ranked is very tweakable. Easy to install & setup, if you read a tutorial or two. Almost 1.0 status, but their Release Candidates have been rock solid.
我已经使用 Sphinx 将近一年了,它非常棒。我可以在我的 MacBook 上在一分钟内索引 150 万个文档,在服务器上甚至更快。我还使用 Sphinx 将搜索限制在特定纬度和经度内的地方,而且速度非常快。此外,结果的排名方式是非常可调整的。如果您阅读一两个教程,则易于安装和设置。几乎是 1.0 状态,但他们的候选版本一直坚如磐石。
回答by Angsuman Chakraborty
Lucene / Solr appears to be more featured and with longer years in business and a much stronger user community. imho if you can get past the initial setup issues as some seems to have faced (not we) then I would say Lucene / Solr is your best bet.
Lucene / Solr 似乎更有特色,业务时间更长,用户社区也更强大。恕我直言,如果你能解决一些似乎已经面临的初始设置问题(不是我们),那么我会说 Lucene / Solr 是你最好的选择。