全文搜索引擎对比——Lucene、Sphinx、Postgresql、MySQL?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/737275/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 13:05:40  来源:igfitidea点击:

Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?

mysqlpostgresqlfull-text-searchlucenesphinx

提问by Continuation

I'm building a Django site and I am looking for a search engine.

我正在构建一个 Django 站点,我正在寻找一个搜索引擎。

A few candidates:

几个候选人:

  • Lucene/Lucene with Compass/Solr

  • Sphinx

  • Postgresql built-in full text search

  • MySQl built-in full text search

  • Lucene/Lucene 与 Compass/Solr

  • 狮身人面像

  • Postgresql 内置全文搜索

  • MySQl 内置全文搜索

Selection criteria:

选择标准:

  • result relevance and ranking
  • searching and indexing speed
  • ease of use and ease of integration with Django
  • resource requirements - site will be hosted on a VPS, so ideally the search engine wouldn't require a lot of RAM and CPU
  • scalability
  • extra features such as "did you mean?", related searches, etc
  • 结果相关性和排名
  • 搜索和索引速度
  • 易于使用和易于与 Django 集成
  • 资源要求 - 站点将托管在VPS 上,因此理想情况下搜索引擎不需要大量 RAM 和 CPU
  • 可扩展性
  • 额外功能,例如“您是什么意思?”、相关搜索等

Anyone who has had experience with the search engines above, or other engines not in the list -- I would love to hear your opinions.

任何使用过上述搜索引擎或不在列表中的其他引擎的人 - 我很想听听您的意见。

EDIT: As for indexing needs, as users keep entering data into the site, those data would need to be indexed continuously. It doesn't have to be real time, but ideally new data would show up in index with no more than 15 - 30 minutes delay

编辑:至于索引需求,随着用户不断向站点输入数据,这些数据需要不断地被索引。它不一定是实时的,但理想情况下,新数据会出现在索引中,延迟不超过 15 - 30 分钟

采纳答案by pat

Good to see someone's chimed in about Lucene - because I've no idea about that.

很高兴看到有人对 Lucene 发表意见 - 因为我不知道这一点。

Sphinx, on the other hand, I know quite well, so let's see if I can be of some help.

Sphinx,另一方面,我很清楚,所以让我们看看我是否能帮上忙。

  • Result relevance ranking is the default. You can set up your own sorting should you wish, and give specific fields higher weightings.
  • Indexing speed is super-fast, because it talks directly to the database. Any slowness will come from complex SQL queries and un-indexed foreign keys and other such problems. I've never noticed any slowness in searching either.
  • I'm a Rails guy, so I've no idea how easy it is to implement with Django. There is a Python API that comes with the Sphinx source though.
  • The search service daemon (searchd) is pretty low on memory usage - and you can set limits on how much memorythe indexer process uses too.
  • Scalability is where my knowledge is more sketchy - but it's easy enough to copy index files to multiple machines and run several searchd daemons. The general impression I get from others though is that it's pretty damn good under high load, so scaling it out across multiple machines isn't something that needs to be dealt with.
  • There's no support for 'did-you-mean', etc - although these can be done with other tools easily enough. Sphinx does stem words though using dictionaries, so 'driving' and 'drive' (for example) would be considered the same in searches.
  • Sphinx doesn't allow partial index updates for field data though. The common approach to this is to maintain a delta index with all the recent changes, and re-index this after every change (and those new results appear within a second or two). Because of the small amount of data, this can take a matter of seconds. You will still need to re-index the main dataset regularly though (although how regularly depends on the volatility of your data - every day? every hour?). The fast indexing speeds keep this all pretty painless though.
  • 结果相关性排名是默认设置。您可以根据需要设置自己的排序,并为特定字段赋予更高的权重。
  • 索引速度非常快,因为它直接与数据库对话。任何缓慢都来自复杂的 SQL 查询和未索引的外键以及其他此类问题。我也从未注意到搜索有任何缓慢。
  • 我是一个 Rails 人,所以我不知道用 Django 实现它有多容易。不过,Sphinx 源代码附带了一个 Python API。
  • 搜索服务守护程序(searchd的)是内存的使用相当低-你可以设置限制多少内存索引过程中使用过。
  • 可扩展性是我的知识比较粗略的地方 - 但是将索引文件复制到多台机器并运行多个 searchd 守护程序很容易。不过,我从其他人那里得到的总体印象是,它在高负载下非常好,因此不需要在多台机器上扩展它。
  • 不支持 'did-you-mean' 等 - 尽管这些可以很容易地用其他工具完成。Sphinx 确实使用字典进行词干,因此“驾驶”和“驾驶”(例如)在搜索中会被视为相同。
  • 不过,Sphinx 不允许对字段数据进行部分索引更新。对此的常用方法是维护包含所有最近更改的 delta 索引,并在每次更改后重新索引(这些新结果会在一两秒内出现)。由于数据量很小,这可能需要几秒钟的时间。不过,您仍然需要定期重新索引主数据集(尽管频率取决于数据的波动性 - 每天?每小时?)。不过,快速的索引速度使这一切都变得轻松自如。

I've no idea how applicable to your situation this is, but Evan Weaver compared a few of the common Rails search options(Sphinx, Ferret (a port of Lucene for Ruby) and Solr), running some benchmarks. Could be useful, I guess.

我不知道这对您的情况有多适用,但Evan Weaver 比较了一些常见的 Rails 搜索选项(Sphinx、Ferret(Ruby 的 Lucene 端口)和 Solr),并运行了一些基准测试。可能有用,我猜。

I've not plumbed the depths of MySQL's full-text search, but I know it doesn't compete speed-wise nor feature-wise with Sphinx, Lucene or Solr.

我没有研究过 MySQL 全文搜索的深度,但我知道它在速度和功能方面都无法与 Sphinx、Lucene 或 Solr 竞争。

回答by Razzie

I don't know Sphinx, but as for Lucene vs a database full-text search, I think that Lucene performance is unmatched. You should be able to do almost anysearch in less than 10 ms, no matter how many records you have to search, provided that you have set up your Lucene index correctly.

我不知道Sphinx,但至于Lucene vs 数据库全文搜索,我认为Lucene的性能是无与伦比的。只要您正确设置了 Lucene 索引,无论您需要搜索多少条记录,您都应该能够在 10 毫秒内完成几乎所有搜索。

Here comes the biggest hurdle though: personally, I think integrating Lucene in your project is not easy. Sure, it is not too hard to set it up so you can do some basic search, but if you want to get the most out of it, with optimal performance, then you definitely need a good book about Lucene.

不过,最大的障碍来了:就个人而言,我认为将 Lucene 集成到您的项目中并不容易。当然,设置它并不太难,因此您可以进行一些基本的搜索,但是如果您想充分利用它并获得最佳性能,那么您绝对需要一本关于 Lucene 的好书。

As for CPU & RAM requirements, performing a search in Lucene doesn't task your CPU too much, though indexing your data is, although you don't do that too often (maybe once or twice a day), so that isn't much of a hurdle.

至于 CPU 和 RAM 要求,在 Lucene 中执行搜索不会给您的 CPU 带来太多任务,尽管索引您的数据是,尽管您不会经常这样做(可能每天一两次),所以这不是很大的障碍。

It doesn't answer all of your questions but in short, if you have a lot of data to search, and you want great performance, then I think Lucene is definitely the way to go. If you're not going to have that much data to search, then you might as well go for a database full-text search. Setting up a MySQL full-text search is definitely easier in my book.

它不能回答您的所有问题,但简而言之,如果您有大量数据要搜索,并且您想要出色的性能,那么我认为 Lucene 绝对是您要走的路。如果您不打算搜索那么多数据,那么您不妨进行数据库全文搜索。在我的书中,设置 MySQL 全文搜索绝对更容易。

回答by Wil Moore III

I am surprised that there isn't more information posted about Solr. Solr is quite similar to Sphinx but has more advanced features (AFAIK as I haven't used Sphinx -- only read about it).

我很惊讶没有发布更多关于 Solr 的信息。Solr 与 Sphinx 非常相似,但具有更高级的功能(AFAIK 因为我没有使用过 Sphinx - 只阅读它)。

The answer at the link below details a few things about Sphinx which also applies to Solr. Comparison of full text search engine - Lucene, Sphinx, Postgresql, MySQL?

下面链接中的答案详细介绍了一些关于 Sphinx 的内容,这些内容也适用于 Solr。 全文搜索引擎对比——Lucene、Sphinx、Postgresql、MySQL?

Solr also provides the following additional features:

Solr 还提供以下附加功能:

  1. Supports replication
  2. Multiple cores (think of these as separate databases with their own configuration and own indexes)
  3. Boolean searches
  4. Highlighting of keywords (fairly easy to do in application code if you have regex-fu; however, why not let a specialized tool do a better job for you)
  5. Update index via XML or delimited file
  6. Communicate with the search server via HTTP (it can even return Json, Native PHP/Ruby/Python)
  7. PDF, Word document indexing
  8. Dynamic fields
  9. Facets
  10. Aggregate fields
  11. Stop words, synonyms, etc.
  12. More Like this...
  13. Index directly from the database with custom queries
  14. Auto-suggest
  15. Cache Autowarming
  16. Fast indexing (compare to MySQL full-text search indexing times) -- Lucene uses a binary inverted index format.
  17. Boosting (custom rules for increasing relevance of a particular keyword or phrase, etc.)
  18. Fielded searches (if a search user knows the field he/she wants to search, they narrow down their search by typing the field, then the value, and ONLY that field is searched rather than everything -- much better user experience)
  1. 支持复制
  2. 多核(将它们视为具有自己的配置和索引的独立数据库)
  3. 布尔搜索
  4. 突出显示关键字(如果您有 regex-fu,在应用程序代码中很容易做到;但是,为什么不让专门的工具为您做得更好)
  5. 通过 XML 或分隔文件更新索引
  6. 通过HTTP与搜索服务器通信(甚至可以返回Json,Native PHP/Ruby/Python)
  7. PDF、Word 文档索引
  8. 动态字段
  9. 刻面
  10. 聚合字段
  11. 停用词、同义词等。
  12. 更多这样的...
  13. 使用自定义查询直接从数据库索引
  14. 自动建议
  15. 缓存自动预热
  16. 快速索引(与 MySQL 全文搜索索引时间相比)——Lucene 使用二进制倒排索引格式。
  17. 提升(用于增加特定关键字或短语等的相关性的自定义规则)
  18. 字段搜索(如果搜索用户知道他/她想要搜索的字段,他们会通过键入字段和值来缩小搜索范围,并且只搜索该字段而不是所有内容——更好的用户体验)

BTW, there are tons more features; however, I've listed just the features that I have actually used in production. BTW, out of the box, MySQL supports #1, #3, and #11 (limited) on the list above. For the features you are looking for, a relational database isn't going to cut it. I'd eliminate those straight away.

顺便说一句,还有很多功能;但是,我只列出了我在生产中实际使用的功能。顺便说一句,开箱即用,MySQL 支持上面列表中的 #1、#3 和 #11(有限)。对于您正在寻找的功能,关系数据库不会削减它。我会立即消除那些。

Also, another benefit is that Solr (well, Lucene actually) is a document database (e.g. NoSQL) so many of the benefits of any other document database can be realized with Solr. In other words, you can use it for more than just search (i.e. Performance). Get creative with it :)

此外,另一个好处是 Solr(好吧,实际上是 Lucene)是一个文档数据库(例如 NoSQL),因此任何其他文档数据库的许多好处都可以通过 Solr 实现。换句话说,您不仅可以将其用于搜索(即性能)。发挥创意:)

回答by Shankar Damodaran

Apache Solr

阿帕奇Solr



Apart from answering OP's queries, Let me throw some insights on Apache Solrfrom simple introductionto detailed installationand implementation.

除了回答 OP 的疑问,让我从简单的介绍详细的安装实现,Apache Solr提出一些见解。

Simple Introduction

简单介绍



Anyone who has had experience with the search engines above, or other engines not in the list -- I would love to hear your opinions.

任何使用过上述搜索引擎或不在列表中的其他引擎的人 - 我很想听听您的意见。

Solrshouldn't be used to solve real-time problems. For search engines, Solris pretty much game and works flawlessly.

Solr不应该用于解决实时问题。对于搜索引擎来说,Solr几乎是一个游戏,并且可以完美运行

Solrworks fine on High Traffic web-applications (I read somewhere that it is not suited for this, but I am backing up that statement). It utilizes the RAM, not the CPU.

Solr在高流量 Web 应用程序上运行良好(我在某处读到它不适合此用途,但我支持该声明)。它使用 RAM,而不是 CPU。

  • result relevance and ranking
  • 结果相关性和排名

The boosthelps you rank your results show up on top. Say, you're trying to search for a name johnin the fields firstnameand lastname, and you want to give relevancy to the firstnamefield, then you need to boostup the firstnamefield as shown.

升压帮助你的排名结果中显示在上面。我说,你想搜索的名字约翰在田里名字姓氏,以及你想给相关的姓名字段,那么你需要提高名字字段,如图所示。

http://localhost:8983/solr/collection1/select?q=firstname:john^2&lastname:john

As you can see, firstnamefield is boostedup with a score of 2.

正如你所看到的,名字字段提升了2分。

More on SolrRelevancy

更多关于SolrRelevancy

  • searching and indexing speed
  • 搜索和索引速度

The speed is unbelievably fast and no compromise on that. The reason I moved to Solr.

速度快得令人难以置信,而且毫不妥协。我搬到Solr的原因。

Regarding the indexing speed, Solrcan also handle JOINSfrom your database tables. A higher and complex JOINdo affect the indexing speed. However, an enormous RAMconfig can easily tackle this situation.

关于索引速度,Solr还可以处理来自数据库表的JOINS。更高更复杂的JOIN确实会影响索引速度。但是,巨大的RAM配置可以轻松解决这种情况。

The higher the RAM, The faster the indexing speed of Solr is.

RAM越高,Solr的索引速度越快。

  • ease of use and ease of integration with Django
  • 易于使用和易于与 Django 集成

Never attempted to integrate Solrand Django, however you can achieve to do that with Haystack. I found some interesting articleon the same and here's the githubfor it.

从未尝试过集成SolrDjango,但是您可以通过Haystack 实现。我发现了一些有趣的文章,这里是它的github

  • resource requirements - site will be hosted on a VPS, so ideally the search engine wouldn't require a lot of RAM and CPU
  • 资源要求 - 站点将托管在 VPS 上,因此理想情况下搜索引擎不需要大量 RAM 和 CPU

Solrbreeds on RAM, so if the RAM is high, you don't to have to worry about Solr.

Solr在 RAM 上繁殖,因此如果 RAM 很高,您不必担心Solr

Solr'sRAM usage shoots up on full-indexing if you have some billion records, you could smartly make use of Delta imports to tackle this situation. As explained, Solris only a near real-time solution.

如果您有十亿条记录,Solr 的RAM 使用量在完整索引时会激增,您可以巧妙地利用 Delta 导入来解决这种情况。正如所解释的,Solr只是一个近乎实时的解决方案

  • scalability
  • 可扩展性

Solris highly scalable. Have a look on SolrCloud. Some key features of it.

Solr是高度可扩展的。看看SolrCloud。它的一些关键特性。

  • Shards (or sharding is the concept of distributing the index among multiple machines, say if your index has grown too large)
  • Load Balancing (if Solrjis used with Solr cloud it automatically takes care of load-balancing using it's Round-Robin mechanism)
  • Distributed Search
  • High Availability
  • 分片(或分片是在多台机器之间分配索引的概念,比如如果您的索引增长过大)
  • 负载平衡(如果Solrj与 Solr 云一起使用,它会使用它的循环机制自动处理负载平衡)
  • 分布式搜索
  • 高可用性
  • extra features such as "did you mean?", related searches, etc
  • 额外功能,例如“您是什么意思?”、相关搜索等

For the above scenario, you could use the SpellCheckComponentthat is packed up with Solr. There are a lot other features, The SnowballPorterFilterFactoryhelps to retrieve records say if you typed, booksinstead of book, you will be presented with results related to book.

对于上述情况,你可以使用SpellCheckComponent是挤满了Solr的。还有很多其他功能,SnowballPorterFilterFactory有助于检索记录,例如,如果您输入的是books而不是book,您将看到与book相关的结果。



This answer broadly focuses on Apache Solr& MySQL. Django is out of scope.

这个答案主要集中在Apache Solr& MySQL。Django 超出范围。

Assuming that you are under LINUX environment, you could proceed to this article further. (mine was an Ubuntu 14.04 version)

假设您在 LINUX 环境下,您可以继续阅读本文。(我的是 Ubuntu 14.04 版本)

Detailed Installation

详细安装

Getting Started

入门

Download Apache Solrfrom here. That would be version is 4.8.1. You could download new versions, I found this stable.

这里下载Apache Solr。那将是版本4.8.1。你可以下载新版本,我发现这很稳定。

After downloading the archive , extract it to a folder of your choice. Say .. Downloadsor whatever.. So it will look like Downloads/solr-4.8.1/

下载存档后,将其解压缩到您选择的文件夹中。说.. Downloads或其他什么..所以它看起来像Downloads/solr-4.8.1/

On your prompt.. Navigate inside the directory

在您的提示下.. 在目录内导航

shankar@shankar-lenovo: cd Downloads/solr-4.8.1

shankar@shankar-lenovo: cd Downloads/solr-4.8.1

So now you are here ..

所以现在你在这里..

shankar@shankar-lenovo: ~/Downloads/solr-4.8.1$

shankar@shankar-lenovo: ~/Downloads/solr-4.8.1$

Start the Jetty Application Server

启动 Jetty 应用服务器

Jettyis available inside the examples folder of the solr-4.8.1directory , so navigate inside that and start the Jetty Application Server.

Jettysolr-4.8.1目录的 examples 文件夹中可用,因此在其中导航并启动 Jetty Application Server。

shankar@shankar-lenovo:~/Downloads/solr-4.8.1/example$ java -jar start.jar

shankar@shankar-lenovo:~/Downloads/solr-4.8.1/example$ java -jar start.jar

Now , do not close the terminal , minimize it and let it stay aside.

现在,不要关闭终端,将其最小化并放在一边。

( TIP : Use & after start.jar to make the Jetty Server run in the background )

(提示:在 start.jar 之后使用 & 使 Jetty Server 在后台运行)

To check if Apache Solrruns successfully, visit this URL on the browser. http://localhost:8983/solr

要检查Apache Solr是否成功运行,请在浏览器上访问此 URL。http://localhost:8983/solr

Running Jetty on custom Port

在自定义端口上运行 Jetty

It runs on the port 8983 as default. You could change the port either here or directly inside the jetty.xmlfile.

默认情况下,它在端口 8983 上运行。您可以在此处或直接在jetty.xml文件内更改端口。

java -Djetty.port=9091 -jar start.jar

java -Djetty.port=9091 -jar start.jar

Download the JConnector

下载 JConnector

This JAR file acts as a bridge between MySQLand JDBC , Download the Platform Independent Version here

这个 JAR 文件充当MySQL和 JDBC之间的桥梁,在此处下载平台独立版本

After downloading it, extract the folder and copy themysql-connector-java-5.1.31-bin.jarand paste it to the libdirectory.

下载后,解压文件夹并复制mysql-connector-java-5.1.31-bin.jar并粘贴到lib目录。

shankar@shankar-lenovo:~/Downloads/solr-4.8.1/contrib/dataimporthandler/lib

shankar@shankar-lenovo:~/Downloads/solr-4.8.1/contrib/dataimporthandler/lib

Creating the MySQL table to be linked to Apache Solr

创建要链接到 Apache Solr 的 MySQL 表

To put Solrto use, You need to have some tables and data to search for. For that, we will use MySQLfor creating a table and pushing some random names and then we could use Solrto connect to MySQLand index that table and it's entries.

要使用Solr,您需要有一些表和数据进行搜索。为此,我们将使用MySQL创建一个表并推送一些随机名称,然后我们可以使用Solr连接到MySQL并索引该表及其条目。

1.Table Structure

1.表结构

CREATE TABLE test_solr_mysql
 (
  id INT UNSIGNED NOT NULL AUTO_INCREMENT,
  name VARCHAR(45) NULL,
  created TIMESTAMP NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (id)
 );

2.Populate the above table

2.填充上表

INSERT INTO `test_solr_mysql` (`name`) VALUES ('Jean');
INSERT INTO `test_solr_mysql` (`name`) VALUES ('Hyman');
INSERT INTO `test_solr_mysql` (`name`) VALUES ('Jason');
INSERT INTO `test_solr_mysql` (`name`) VALUES ('Vego');
INSERT INTO `test_solr_mysql` (`name`) VALUES ('Grunt');
INSERT INTO `test_solr_mysql` (`name`) VALUES ('Jasper');
INSERT INTO `test_solr_mysql` (`name`) VALUES ('Fred');
INSERT INTO `test_solr_mysql` (`name`) VALUES ('Jenna');
INSERT INTO `test_solr_mysql` (`name`) VALUES ('Rebecca');
INSERT INTO `test_solr_mysql` (`name`) VALUES ('Roland');

Getting inside the core and adding the lib directives

进入核心并添加 lib 指令

1.Navigate to

1.导航到

shankar@shankar-lenovo: ~/Downloads/solr-4.8.1/example/solr/collection1/conf

2.Modifying the solrconfig.xml

2.修改solrconfig.xml

Add these two directives to this file..

将这两个指令添加到此文件中..

  <lib dir="../../../contrib/dataimporthandler/lib/" regex=".*\.jar" />
  <lib dir="../../../dist/" regex="solr-dataimporthandler-\d.*\.jar" />

Now add the DIH(Data Import Handler)

现在添加DIH(数据导入处理程序)

<requestHandler name="/dataimport" 
  class="org.apache.solr.handler.dataimport.DataImportHandler" >
    <lst name="defaults">
      <str name="config">db-data-config.xml</str>
    </lst>
</requestHandler>

3.Create the db-data-config.xml file

3.创建db-data-config.xml文件

If the file exists then ignore, add these lines to that file. As you can see the first line, you need to provide the credentials of your MySQLdatabase. The Database name, username and password.

如果文件存在则忽略,将这些行添加到该文件中。正如您看到的第一行,您需要提供MySQL数据库的凭据。数据库名称、用户名和密码。

<dataConfig>
    <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/yourdbname" user="dbuser" password="dbpass"/>
    <document>
   <entity name="test_solr" query="select CONCAT('test_solr-',id) as rid,name from test_solr_mysql WHERE '${dataimporter.request.clean}' != 'false'
      OR `created` > '${dataimporter.last_index_time}'" >
    <field name="id" column="rid" />
    <field name="solr_name" column="name" />
    </entity>
   </document>
</dataConfig>

( TIP : You can have any number of entities but watch out for id field, if they are same then indexing will skipped. )

(提示:您可以拥有任意数量的实体,但要注意 id 字段,如果它们相同,则索引将被跳过。)

4.Modify the schema.xml file

4.修改schema.xml文件

Add this to your schema.xmlas shown..

将其添加到您的schema.xml 中,如图所示..

<uniqueKey>id</uniqueKey>
<field name="solr_name" type="string" indexed="true" stored="true" />

Implementation

执行

Indexing

索引

This is where the real deal is. You need to do the indexing of data from MySQLto Solrinorder to make use of Solr Queries.

这是真正的交易所在。您需要将数据从MySQL索引到Solr,以便使用 Solr 查询。

Step 1: Go to Solr Admin Panel

第 1 步:转到 Solr 管理面板

Hit the URL http://localhost:8983/solron your browser. The screen opens like this.

在浏览器上点击 URL http://localhost:8983/solr。画面是这样打开的。

This is the main Apache Solr Administration Panel

这是主要的 Apache Solr 管理面板

As the marker indicates, go to Logginginorder to check if any of the above configuration has led to errors.

如标记所示,转到日志记录以检查上述任何配置是否导致错误。

Step 2: Check your Logs

第 2 步:检查您的日志

Ok so now you are here, As you can there are a lot of yellow messages (WARNINGS). Make sure you don't have error messages marked in red. Earlier, on our configuration we had added a select query on our db-data-config.xml, say if there were any errors on that query, it would have shown up here.

好的,现在你在这里,因为你可以有很多黄色消息(警告)。确保您没有标记为红色的错误消息。早些时候,在我们的配置中,我们在db-data-config.xml上添加了一个选择查询,假设该查询有任何错误,它会显示在这里。

This is the logging section of your Apache Solr engine

这是 Apache Solr 引擎的日志记录部分

Fine, no errors. We are good to go. Let's choose collection1from the list as depicted and select Dataimport

很好,没有错误。我们很高兴去。让我们从列表中选择collection1,如图所示,然后选择Dataimport

Step 3: DIH (Data Import Handler)

第 3 步:DIH(数据导入处理程序)

Using the DIH, you will be connecting to MySQLfrom Solrthrough the configuration file db-data-config.xmlfrom the Solrinterface and retrieve the 10 records from the database which gets indexed onto Solr.

使用 DIH,您将从Solr通过配置文件db-data-config.xmlSolr接口连接到MySQL,并从数据库中检索 10 条记录,这些记录被索引到Solr 上

To do that, Choose full-import, and check the options Cleanand Commit. Now click Executeas shown.

为此,请选择full-import,然后检查选项CleanCommit。现在单击执行,如图所示。

Alternatively, you could use a direct full-importquery like this too..

或者,您也可以使用像这样的直接完全导入查询..

http://localhost:8983/solr/collection1/dataimport?command=full-import&commit=true

The Data Import Handler

数据导入处理程序

After you clicked Execute, Solrbegins to index the records, if there were any errors, it would say Indexing Failedand you have to go back to the Loggingsection to see what has gone wrong.

单击Execute 后Solr开始对记录进行索引,如果有任何错误,它会显示Indexing Failed,您必须返回Logging部分以查看出了什么问题。

Assuming there are no errors with this configuration and if the indexing is successfully complete., you would get this notification.

假设此配置没有错误,并且索引成功完成,您将收到此通知。

Indexing Success

索引成功

Step 4: Running Solr Queries

第 4 步:运行 Solr 查询

Seems like everything went well, now you could use SolrQueries to query the data that was indexed. Click the Queryon the left and then press Executebutton on the bottom.

似乎一切顺利,现在您可以使用SolrQueries 来查询已建立索引的数据。单击左侧的Query,然后按底部的Execute按钮。

You will see the indexed records as shown.

您将看到如图所示的索引记录。

The corresponding Solrquery for listing all the records is

列出所有记录的相应Solr查询是

http://localhost:8983/solr/collection1/select?q=*:*&wt=json&indent=true

The indexed data

索引数据

Well, there goes all 10 indexed records. Say, we need only names starting with Ja, in this case, you need to target the column name solr_name, Hence your query goes like this.

嗯,所有 10 条索引记录都有。说,我们只需要以Ja开头的名称,在这种情况下,您需要定位列名称solr_name,因此您的查询是这样的。

http://localhost:8983/solr/collection1/select?q=solr_name:Ja*&wt=json&indent=true

The JSON data starting with Ja*

The JSON data starting with Ja*

That's how you write SolrQueries. To read more about it, Check this beautiful article.

这就是您编写Solr查询的方式。要阅读有关它的更多信息,请查看这篇精美的文章

回答by SearchTools-Avi

I'm looking at PostgreSQL full-text search right now, and it has all the right features of a modern search engine, really good extended character and multilingual support, nice tight integration with text fields in the database.

我现在正在研究 PostgreSQL 全文搜索,它具有现代搜索引擎的所有正确功能,非常好的扩展字符和多语言支持,与数据库中的文本字段紧密集成。

But it doesn't have user-friendly search operators like + or AND (uses & | !) and I'm not thrilled with how it works on their documentation site. While it has bolding of match terms in the results snippets, the default algorithm for which match terms is not great. Also, if you want to index rtf, PDF, MS Office, you have to find and integrate a file format converter.

但是它没有像 + 或 AND(使用 & | !)这样的用户友好的搜索运算符,而且我对它在他们的文档站点上的工作方式并不感到兴奋。虽然它在结果片段中加粗了匹配项,但匹配项的默认算法并不是很好。另外,如果你想索引 rtf、PDF、MS Office,你必须找到并集成一个文件格式转换器。

OTOH, it's way better than the MySQL text search, which doesn't even index words of three letters or fewer. It's the default for the MediaWiki search, and I really think it's no good for end-users: http://www.searchtools.com/analysis/mediawiki-search/

OTOH,它比 MySQL 文本搜索要好得多,它甚至不索引三个或更少字母的单词。这是 MediaWiki 搜索的默认设置,我真的认为这对最终用户没有好处:http: //www.searchtools.com/analysis/mediawiki-search/

In all cases I've seen, Lucene/Solr and Sphinx are really great. They're solid code and have evolved with significant improvements in usability, so the tools are all there to make search that satisfies almost everyone.

在我见过的所有情况下,Lucene/Solr 和 Sphinx 都非常棒。它们是可靠的代码,并且随着可用性的显着改进而发展,因此这些工具都可以使搜索满足几乎所有人的需求。

for SHAILI - SOLR includes the Lucene search code library and has the components to be a nice stand-alone search engine.

对于 SHAILI - SOLR 包括 Lucene 搜索代码库,并具有成为一个不错的独立搜索引擎的组件。

回答by vooD

Just my two cents to this very old question. I would highly recommend taking a look at ElasticSearch.

对于这个非常古老的问题,我只需要两分钱。我强烈建议您查看ElasticSearch

Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.

Elasticsearch 是一个基于 Lucene 的搜索服务器。它提供了一个分布式、支持多租户的全文搜索引擎,带有 RESTful Web 界面和无模式 JSON 文档。Elasticsearch 是用 Java 开发的,并根据 Apache 许可条款作为开源发布。

The advantages over other FTS (full text search) Engines are:

与其他 FTS(全文搜索)引擎相比的优势在于:

  • RESTful interface
  • Better scalability
  • Large community
  • Built by Lucene developers
  • Extensive documentation
  • There are manyopen source libraries available (including Django)
  • RESTful 接口
  • 更好的可扩展性
  • 大型社区
  • 由 Lucene 开发人员构建
  • 广泛的文档
  • 有许多可用的开源库(包括 Django)

We are using this search engine at our project and very happy with it.

我们在我们的项目中使用了这个搜索引擎,并且对它非常满意。

回答by BJ.

SearchTools-Avi said "MySQL text search, which doesn't even index words of three letters or fewer."

SearchTools-Avi 说“MySQL 文本搜索,它甚至不索引三个或更少字母的单词。”

FYIs, The MySQL fulltext min word length is adjustable since at leastMySQL 5.0. Google 'mysql fulltext min length' for simple instructions.

仅供参考,MySQL 全文最小字长至少从MySQL 5.0 开始是可调整的。Google 'mysql fulltext min length' 的简单说明。

That said, MySQL fulltext has limitations: for one, it gets slow to update once you reach a million records or so, ...

也就是说,MySQL 全文有局限性:一方面,一旦达到一百万条左右,更新就会变慢,......

回答by Fedir RYKHTIK

I would add mnoGoSearchto the list. Extremely performant and flexible solution, which works as Google : indexer fetches data from multiple sites, You could use basic criterias, or invent Your own hooks to have maximal search quality. Also it could fetch the data directly from the database.

我会将mnoGoSearch添加到列表中。非常高效且灵活的解决方案,可作为 Google 使用:索引器从多个站点获取数据,您可以使用基本条件,或发明您自己的钩子以获得最大的搜索质量。它也可以直接从数据库中获取数据。

The solution is not so known today, but it feets maximum needs. You could compile and install it or on standalone server, or even on Your principal server, it doesn't need so much ressources as Solr, as it's written in C and runs perfectly even on small servers.

今天的解决方案并不为人所知,但它满足了最大的需求。您可以编译和安装它或在独立服务器上,甚至在您的主服务器上,它不需要像 Solr 那么多的资源,因为它是用 C 编写的,即使在小型服务器上也能完美运行。

In the beginning You need to compile it Yourself, so it requires some knowledge. I made a tiny scriptfor Debian, which could help. Any adjustments are welcome.

一开始需要自己编译,所以需要一定的知识。我为 Debian制作了一个小脚本,它可以提供帮助。欢迎任何调整。

As You are using Django framework, You could use or PHP client in the middle, or find a solution in Python, I saw somearticles.

由于您使用的是Django框架,您可以在中间使用或PHP客户端,或者在Python中找到解决方案,我看到了一些文章

And, of course mnoGoSearch is open source, GNU GPL.

而且,当然 mnoGoSearch 是开源的,GNU GPL。