有什么理由不在 Heroku 上使用 PostgreSQL 的内置全文搜索？

Question

提问by Ethan

I'm preparing to deploy a Rails app on Heroku that requires full text search. Up to now I've been running it on a VPS using MySQL with Sphinx.

我正准备在 Heroku 上部署一个需要全文搜索的 Rails 应用程序。到目前为止，我一直在使用 MySQL 和 Sphinx 的 VPS 上运行它。

However, if I want to use Sphinx or Solr on Heroku, I'd need to pay for an add-on.

但是，如果我想在 Heroku 上使用 Sphinx 或 Solr，我需要为附加组件付费。

I notice that PostgreSQL (the DB used on Heroku) has built-in full text search capability.

我注意到 PostgreSQL（Heroku 上使用的数据库）具有内置的全文搜索功能。

Is there a reason I couldn't use Postgres's full-text search? Is it slower than Sphinx or is there some other major limitation?

我不能使用 Postgres 的全文搜索有什么原因吗？它比 Sphinx 慢还是有其他一些主要限制？

Answer 1

回答by Nick Zadrozny

Edit, 2016 —?Why not both?

编辑，2016 年——？为什么不是两者兼而有之？

If you're interested in Postgres vs. Lucene, why not both? Check out the ZomboDBextension for Postgres, which integrates Elasticsearch as a first-class index type. Still a fairly early project but it looks really promising to me.

如果您对 Postgres 与 Lucene 感兴趣，为什么不对两者都感兴趣？查看Postgres的ZomboDB扩展，它集成了 Elasticsearch 作为一流的索引类型。仍然是一个相当早期的项目，但对我来说看起来很有希望。

https://github.com/zombodb/zombodb

https://github.com/zombodb/zombodb

(Technically not available on Heroku, but still worth looking at.)

（技术上在 Heroku 上不可用，但仍然值得一看。）

Disclosure: I'm a cofounder of the Websolrand BonsaiHeroku add-ons, so my perspective is a bit biased toward Lucene.

披露：我是Websolr和BonsaiHeroku 附加组件的联合创始人，所以我的观点有点偏向于 Lucene。

My read on Postgres full-text search is that it is pretty solid for straightforward use cases, but there are a number of reasons why Lucene (and thus Solr and ElasticSearch) is superior both in terms of performance and functionality.

我对 Postgres 全文搜索的理解是，它对于简单的用例来说非常可靠，但是 Lucene（以及 Solr 和 ElasticSearch）在性能和功能方面都优越的原因有很多。

For starters, jpountzprovides a truly excellent technical answer to the question, Why is Solr so much faster than Postgres?It's worth a couple of reads through to really digest.

对于初学者来说，jpountz为这个问题提供了一个真正出色的技术答案，为什么 Solr 比 Postgres 快得多？值得多读几遍才能真正消化。

I also commented on a recent RailsCast episodecomparing relative advantages and disadvantages of Postgres full-text search versus Solr. Let me recap that here:

我还评论了最近的 RailsCast 插曲，比较了 Postgres 全文搜索与 Solr 的相对优势和劣势。让我在这里回顾一下：

Pragmatic advantages to Postgres

Postgres 的实用优势

Reuse an existing service that you're already running instead of setting up and maintaining (or paying for) something else.
Far superior to the fantastically slow SQL LIKEoperator.
Less hassle keeping data in sync since it's all in the same database —?no application-level integration with some external data service API.

重用您已经在运行的现有服务，而不是设置和维护（或支付）其他东西。
远优于极其缓慢的 SQLLIKE运算符。
由于数据都在同一个数据库中，因此保持数据同步的麻烦更少——没有与某些外部数据服务 API 的应用程序级集成。

Advantages to Solr (or ElasticSearch)

Solr（或 ElasticSearch）的优势

Off the top of my head, in no particular order…

在我的头顶上，没有特别的顺序......

Scale your indexing and search load separately from your regular database load.
More flexible term analysis for things like accent normalizing, linguistic stemming, N-grams, markup removal… Other cool features like spellcheck, "rich content" (e.g., PDF and Word) extraction…
Solr/Lucene can do everything on the Postgres full-text search TODO listjust fine.
Much better and faster term relevancy ranking, efficiently customizable at search time.
Probably faster search performance for common terms or complicated queries.
Probably more efficient indexing performance than Postgres.
Better tolerance for change in your data model by decoupling indexing from your primary data store

将索引和搜索负载与常规数据库负载分开扩展。
更灵活的术语分析，如重音规范化、语言词干、N-gram、标记删除……其他很酷的功能，如拼写检查、“丰富内容”（例如，PDF 和 Word）提取……
Solr/Lucene 可以做Postgres 全文搜索 TODO 列表上的所有事情就好了。
更好更快的术语相关性排名，可在搜索时高效定制。
常见术语或复杂查询的搜索性能可能会更快。
可能比 Postgres 更有效的索引性能。
通过将索引与主数据存储分离，更好地容忍数据模型的变化

Clearly I think a dedicated search engine based on Lucene is the better option here. Basically, you can think of Lucene as the de facto open source repository of search expertise.

显然，我认为基于 Lucene 的专用搜索引擎是更好的选择。基本上，您可以将 Lucene 视为搜索专业知识的事实上的开源存储库。

But if your only other option is the LIKEoperator, then Postgres full-text search is a definite win.

但是，如果您唯一的其他选择是LIKE运算符，那么 Postgres 全文搜索无疑是一个胜利。

Answer 2

回答by yekta

Since I just went through the effort of comparing elastic search (1.9) against postgres FTS, I figured I should share my results since they're somewhat more current than the ones @gustavodiazjaimes cites.

由于我刚刚尝试将弹性搜索 (1.9) 与 postgres FTS 进行比较，我想我应该分享我的结果，因为它们比@gustavodiazjaimes 引用的结果更新一些。

My main concern with postgres was that it did not have faceting built in, but that's trivial to build yourself, here's my example (in django):

我对 postgres 的主要担忧是它没有内置分面，但是构建自己很简单，这是我的示例（在 Django 中）：

results = YourModel.objects.filter(vector_search=query)
facets = (results
    .values('book')
    .annotate(total=Count('book'))
    .order_by('book'))

I'm using postgres 9.6 and elastic-search 1.9 (through haystack on django). Here's a comparison between elasticsearch and postgres across 16 various types of queries.

我正在使用 postgres 9.6 和 elastic-search 1.9（通过 django 上的 haystack）。下面是对 16 种不同类型查询的 elasticsearch 和 postgres 的比较。

    es_times  pg_times  es_times_faceted  pg_times_faceted
0   0.065972  0.000543          0.015538          0.037876
1   0.000292  0.000233          0.005865          0.007130
2   0.000257  0.000229          0.005203          0.002168
3   0.000247  0.000161          0.003052          0.001299
4   0.000276  0.000150          0.002647          0.001167
5   0.000245  0.000151          0.005098          0.001512
6   0.000251  0.000155          0.005317          0.002550
7   0.000331  0.000163          0.005635          0.002202
8   0.000268  0.000168          0.006469          0.002408
9   0.000290  0.000236          0.006167          0.002398
10  0.000364  0.000224          0.005755          0.001846
11  0.000264  0.000182          0.005153          0.001667
12  0.000287  0.000153          0.010218          0.001769
13  0.000264  0.000231          0.005309          0.001586
14  0.000257  0.000195          0.004813          0.001562
15  0.000248  0.000174          0.032146          0.002246
                  count      mean       std       min       25%       50%       75%       max
es_times           16.0  0.004382  0.016424  0.000245  0.000255  0.000266  0.000291  0.065972
pg_times           16.0  0.000209  0.000095  0.000150  0.000160  0.000178  0.000229  0.000543
es_times_faceted   16.0  0.007774  0.007150  0.002647  0.005139  0.005476  0.006242  0.032146
pg_times_faceted   16.0  0.004462  0.009015  0.001167  0.001580  0.002007  0.002400  0.037876

In order to get postgres to these speeds?for faceted searches I had to use an GIN index on the field with a SearchVectorField, which is django specific but I'm sure other frameworks have a similar vector type.

为了让 postgres 达到这些速度？对于分面搜索，我必须在带有 SearchVectorField 的字段上使用 GIN 索引，这是 django 特定的，但我确信其他框架具有类似的向量类型。

One other consideration is that pg 9.6 now supports phrase matching, which is huge.

另一个考虑因素是 pg 9.6 现在支持短语匹配，这是巨大的。

My take away is that postgres is for most cases going to be preferrable as it offers:

我的结论是 postgres 在大多数情况下会更受欢迎，因为它提供：

simpler stack
no search backend api wrapper dependencies to contend with (thinking-sphinx, django-sphinx, haystack etc.). These can be a drag since they might not support the features your search back-end does (e.g. haystack faceting/aggregates).
has similar performance and features (for my needs)

更简单的堆栈
没有搜索后端 api 包装器依赖（thinking-sphinx、django-sphinx、haystack 等）。这些可能是一个拖累，因为它们可能不支持您的搜索后端所做的功能（例如 haystack faceting/aggregates）。
具有相似的性能和功能（满足我的需求）

Answer 3

回答by gustavodiazjaimes

I found this amazing comparison and want to share it:

我发现了这个惊人的比较并想分享它：

Full Text Search In PostgreSQL

PostgreSQL 全文检索

Time to Build Index LIKE predicate -- none
PostgreSQL / GIN -- 40 min
Sphinx Search -- 6 min
Apache Lucene -- 9 min
Inverted index -- high

构建索引 LIKE 谓词的时间 -- 无
PostgreSQL / GIN -- 40 分钟
Sphinx 搜索 -- 6 分钟
Apache Lucene -- 9 分钟
倒排索引 -- 高

Index Storage LIKE predicate -- none
PostgreSQL / GIN -- 532 MB
Sphinx Search -- 533 MB
Apache Lucene -- 1071 MB
Inverted index -- 101 MB

索引存储 LIKE 谓词 -- 无
PostgreSQL / GIN -- 532 MB
Sphinx 搜索 -- 533 MB
Apache Lucene -- 1071 MB
倒排索引 -- 101 MB

Query Speed LIKE predicate -- 90+ seconds
PostgreSQL / GIN -- 20 ms
Sphinx Search -- 8 ms
Apache Lucene -- 80 ms
Inverted index -- 40 ms

查询速度 LIKE 谓词 -- 90+ 秒
PostgreSQL / GIN -- 20 ms
Sphinx 搜索 -- 8 ms
Apache Lucene -- 80 ms
倒排索引 -- 40 ms

Answer 4

回答by Devi

Postgres's full text search has amazing capabilities in the areas of stemming, ranking/boosting, synonym handling, fuzzy searches among others - but no support for faceted search.

Postgres 的全文搜索在词干提取、排名/提升、同义词处理、模糊搜索等领域具有惊人的能力——但不支持分面搜索。

So, if Postgres is already in your stack and you don't need faceting, better try it out to avail the HUGE benefit of ease of keeping indices in sync and maintaining sleek stack, before looking out for Lucene based solutions - at least if all your app is not based on search.

因此，如果 Postgres 已经在您的堆栈中并且您不需要分面，那么在寻找基于 Lucene 的解决方案之前，最好尝试一下它以利用易于保持索引同步和维护光滑堆栈的巨大好处 - 至少如果所有的话您的应用并非基于搜索。

Answer 5

回答by Scott Marlowe

Postgresql's FTS function is mature and fairly fast at lookups. It's worth a look for sure.

Postgresql 的 FTS 功能已经成熟并且在查找方面相当快。值得一看。

Answer 6

回答by Eugene Lycenok

Some fresher results for synthetic customer data (10 million records).

合成客户数据的一些新鲜结果（1000 万条记录）。

有什么理由不在 Heroku 上使用 PostgreSQL 的内置全文搜索？

提问by Ethan

回答by Nick Zadrozny

Edit, 2016 —?Why not both?

编辑，2016 年——？为什么不是两者兼而有之？

Pragmatic advantages to Postgres

Postgres 的实用优势

Advantages to Solr (or ElasticSearch)

Solr（或 ElasticSearch）的优势

回答by yekta

回答by gustavodiazjaimes

回答by Devi

回答by Scott Marlowe

回答by Eugene Lycenok

相关推荐

最近更新

标签

有什么理由不在 Heroku 上使用 PostgreSQL 的内置全文搜索？

提问by Ethan

回答by Nick Zadrozny

Edit, 2016 —?Why not both?

编辑，2016 年——？为什么不是两者兼而有之？

Pragmatic advantages to Postgres

Postgres 的实用优势

Advantages to Solr (or ElasticSearch)

Solr（或 ElasticSearch）的优势

回答by yekta

回答by gustavodiazjaimes

回答by Devi

回答by Scott Marlowe

回答by Eugene Lycenok

相关推荐

postgresql PL/pgSQL 函数中的 SELECT 或 PERFORM

关于 PLPGSQL 的 PostgreSQL 9.1 pg_restore 错误

postgresql 检查约束中的 SQL 子查询

带双引号的 postgresql COPY 和 CSV 数据

相关推荐

最近更新

标签