有什么理由不在 Heroku 上使用 PostgreSQL 的内置全文搜索?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10875674/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Any reason not use PostgreSQL's built-in full text search on Heroku?
提问by Ethan
I'm preparing to deploy a Rails app on Heroku that requires full text search. Up to now I've been running it on a VPS using MySQL with Sphinx.
我正准备在 Heroku 上部署一个需要全文搜索的 Rails 应用程序。到目前为止,我一直在使用 MySQL 和 Sphinx 的 VPS 上运行它。
However, if I want to use Sphinx or Solr on Heroku, I'd need to pay for an add-on.
但是,如果我想在 Heroku 上使用 Sphinx 或 Solr,我需要为附加组件付费。
I notice that PostgreSQL (the DB used on Heroku) has built-in full text search capability.
我注意到 PostgreSQL(Heroku 上使用的数据库)具有内置的全文搜索功能。
Is there a reason I couldn't use Postgres's full-text search? Is it slower than Sphinx or is there some other major limitation?
我不能使用 Postgres 的全文搜索有什么原因吗?它比 Sphinx 慢还是有其他一些主要限制?
回答by Nick Zadrozny
Edit, 2016 —?Why not both?
编辑,2016 年——?为什么不是两者兼而有之?
If you're interested in Postgres vs. Lucene, why not both? Check out the ZomboDBextension for Postgres, which integrates Elasticsearch as a first-class index type. Still a fairly early project but it looks really promising to me.
如果您对 Postgres 与 Lucene 感兴趣,为什么不对两者都感兴趣?查看Postgres的ZomboDB扩展,它集成了 Elasticsearch 作为一流的索引类型。仍然是一个相当早期的项目,但对我来说看起来很有希望。
(Technically not available on Heroku, but still worth looking at.)
(技术上在 Heroku 上不可用,但仍然值得一看。)
Disclosure: I'm a cofounder of the Websolrand BonsaiHeroku add-ons, so my perspective is a bit biased toward Lucene.
披露:我是Websolr和BonsaiHeroku 附加组件的联合创始人,所以我的观点有点偏向于 Lucene。
My read on Postgres full-text search is that it is pretty solid for straightforward use cases, but there are a number of reasons why Lucene (and thus Solr and ElasticSearch) is superior both in terms of performance and functionality.
我对 Postgres 全文搜索的理解是,它对于简单的用例来说非常可靠,但是 Lucene(以及 Solr 和 ElasticSearch)在性能和功能方面都优越的原因有很多。
For starters, jpountzprovides a truly excellent technical answer to the question, Why is Solr so much faster than Postgres?It's worth a couple of reads through to really digest.
对于初学者来说,jpountz为这个问题提供了一个真正出色的技术答案,为什么 Solr 比 Postgres 快得多?值得多读几遍才能真正消化。
I also commented on a recent RailsCast episodecomparing relative advantages and disadvantages of Postgres full-text search versus Solr. Let me recap that here:
我还评论了最近的 RailsCast 插曲,比较了 Postgres 全文搜索与 Solr 的相对优势和劣势。让我在这里回顾一下:
Pragmatic advantages to Postgres
Postgres 的实用优势
- Reuse an existing service that you're already running instead of setting up and maintaining (or paying for) something else.
- Far superior to the fantastically slow SQL
LIKE
operator. - Less hassle keeping data in sync since it's all in the same database —?no application-level integration with some external data service API.
- 重用您已经在运行的现有服务,而不是设置和维护(或支付)其他东西。
- 远优于极其缓慢的 SQL
LIKE
运算符。 - 由于数据都在同一个数据库中,因此保持数据同步的麻烦更少——没有与某些外部数据服务 API 的应用程序级集成。
Advantages to Solr (or ElasticSearch)
Solr(或 ElasticSearch)的优势
Off the top of my head, in no particular order…
在我的头顶上,没有特别的顺序......
- Scale your indexing and search load separately from your regular database load.
- More flexible term analysis for things like accent normalizing, linguistic stemming, N-grams, markup removal… Other cool features like spellcheck, "rich content" (e.g., PDF and Word) extraction…
- Solr/Lucene can do everything on the Postgres full-text search TODO listjust fine.
- Much better and faster term relevancy ranking, efficiently customizable at search time.
- Probably faster search performance for common terms or complicated queries.
- Probably more efficient indexing performance than Postgres.
- Better tolerance for change in your data model by decoupling indexing from your primary data store
- 将索引和搜索负载与常规数据库负载分开扩展。
- 更灵活的术语分析,如重音规范化、语言词干、N-gram、标记删除……其他很酷的功能,如拼写检查、“丰富内容”(例如,PDF 和 Word)提取……
- Solr/Lucene 可以做Postgres 全文搜索 TODO 列表上的所有事情就好了。
- 更好更快的术语相关性排名,可在搜索时高效定制。
- 常见术语或复杂查询的搜索性能可能会更快。
- 可能比 Postgres 更有效的索引性能。
- 通过将索引与主数据存储分离,更好地容忍数据模型的变化
Clearly I think a dedicated search engine based on Lucene is the better option here. Basically, you can think of Lucene as the de facto open source repository of search expertise.
显然,我认为基于 Lucene 的专用搜索引擎是更好的选择。基本上,您可以将 Lucene 视为搜索专业知识的事实上的开源存储库。
But if your only other option is the LIKE
operator, then Postgres full-text search is a definite win.
但是,如果您唯一的其他选择是LIKE
运算符,那么 Postgres 全文搜索无疑是一个胜利。
回答by yekta
Since I just went through the effort of comparing elastic search (1.9) against postgres FTS, I figured I should share my results since they're somewhat more current than the ones @gustavodiazjaimes cites.
由于我刚刚尝试将弹性搜索 (1.9) 与 postgres FTS 进行比较,我想我应该分享我的结果,因为它们比@gustavodiazjaimes 引用的结果更新一些。
My main concern with postgres was that it did not have faceting built in, but that's trivial to build yourself, here's my example (in django):
我对 postgres 的主要担忧是它没有内置分面,但是构建自己很简单,这是我的示例(在 Django 中):
results = YourModel.objects.filter(vector_search=query)
facets = (results
.values('book')
.annotate(total=Count('book'))
.order_by('book'))
I'm using postgres 9.6 and elastic-search 1.9 (through haystack on django). Here's a comparison between elasticsearch and postgres across 16 various types of queries.
我正在使用 postgres 9.6 和 elastic-search 1.9(通过 django 上的 haystack)。下面是对 16 种不同类型查询的 elasticsearch 和 postgres 的比较。
es_times pg_times es_times_faceted pg_times_faceted
0 0.065972 0.000543 0.015538 0.037876
1 0.000292 0.000233 0.005865 0.007130
2 0.000257 0.000229 0.005203 0.002168
3 0.000247 0.000161 0.003052 0.001299
4 0.000276 0.000150 0.002647 0.001167
5 0.000245 0.000151 0.005098 0.001512
6 0.000251 0.000155 0.005317 0.002550
7 0.000331 0.000163 0.005635 0.002202
8 0.000268 0.000168 0.006469 0.002408
9 0.000290 0.000236 0.006167 0.002398
10 0.000364 0.000224 0.005755 0.001846
11 0.000264 0.000182 0.005153 0.001667
12 0.000287 0.000153 0.010218 0.001769
13 0.000264 0.000231 0.005309 0.001586
14 0.000257 0.000195 0.004813 0.001562
15 0.000248 0.000174 0.032146 0.002246
count mean std min 25% 50% 75% max
es_times 16.0 0.004382 0.016424 0.000245 0.000255 0.000266 0.000291 0.065972
pg_times 16.0 0.000209 0.000095 0.000150 0.000160 0.000178 0.000229 0.000543
es_times_faceted 16.0 0.007774 0.007150 0.002647 0.005139 0.005476 0.006242 0.032146
pg_times_faceted 16.0 0.004462 0.009015 0.001167 0.001580 0.002007 0.002400 0.037876
In order to get postgres to these speeds?for faceted searches I had to use an GIN index on the field with a SearchVectorField, which is django specific but I'm sure other frameworks have a similar vector type.
为了让 postgres 达到这些速度?对于分面搜索,我必须在带有 SearchVectorField 的字段上使用 GIN 索引,这是 django 特定的,但我确信其他框架具有类似的向量类型。
One other consideration is that pg 9.6 now supports phrase matching, which is huge.
另一个考虑因素是 pg 9.6 现在支持短语匹配,这是巨大的。
My take away is that postgres is for most cases going to be preferrable as it offers:
我的结论是 postgres 在大多数情况下会更受欢迎,因为它提供:
- simpler stack
- no search backend api wrapper dependencies to contend with (thinking-sphinx, django-sphinx, haystack etc.). These can be a drag since they might not support the features your search back-end does (e.g. haystack faceting/aggregates).
- has similar performance and features (for my needs)
- 更简单的堆栈
- 没有搜索后端 api 包装器依赖(thinking-sphinx、django-sphinx、haystack 等)。这些可能是一个拖累,因为它们可能不支持您的搜索后端所做的功能(例如 haystack faceting/aggregates)。
- 具有相似的性能和功能(满足我的需求)
回答by gustavodiazjaimes
I found this amazing comparison and want to share it:
我发现了这个惊人的比较并想分享它:
Full Text Search In PostgreSQL
Time to Build Index LIKE predicate -- none
PostgreSQL / GIN -- 40 min
Sphinx Search -- 6 min
Apache Lucene -- 9 min
Inverted index -- high
构建索引 LIKE 谓词的时间 -- 无
PostgreSQL / GIN -- 40 分钟
Sphinx 搜索 -- 6 分钟
Apache Lucene -- 9 分钟
倒排索引 -- 高
Index Storage LIKE predicate -- none
PostgreSQL / GIN -- 532 MB
Sphinx Search -- 533 MB
Apache Lucene -- 1071 MB
Inverted index -- 101 MB
索引存储 LIKE 谓词 -- 无
PostgreSQL / GIN -- 532 MB
Sphinx 搜索 -- 533 MB
Apache Lucene -- 1071 MB
倒排索引 -- 101 MB
Query Speed LIKE predicate -- 90+ seconds
PostgreSQL / GIN -- 20 ms
Sphinx Search -- 8 ms
Apache Lucene -- 80 ms
Inverted index -- 40 ms
查询速度 LIKE 谓词 -- 90+ 秒
PostgreSQL / GIN -- 20 ms
Sphinx 搜索 -- 8 ms
Apache Lucene -- 80 ms
倒排索引 -- 40 ms
回答by Devi
Postgres's full text search has amazing capabilities in the areas of stemming, ranking/boosting, synonym handling, fuzzy searches among others - but no support for faceted search.
Postgres 的全文搜索在词干提取、排名/提升、同义词处理、模糊搜索等领域具有惊人的能力——但不支持分面搜索。
So, if Postgres is already in your stack and you don't need faceting, better try it out to avail the HUGE benefit of ease of keeping indices in sync and maintaining sleek stack, before looking out for Lucene based solutions - at least if all your app is not based on search.
因此,如果 Postgres 已经在您的堆栈中并且您不需要分面,那么在寻找基于 Lucene 的解决方案之前,最好尝试一下它以利用易于保持索引同步和维护光滑堆栈的巨大好处 - 至少如果所有的话您的应用并非基于搜索。
回答by Scott Marlowe
Postgresql's FTS function is mature and fairly fast at lookups. It's worth a look for sure.
Postgresql 的 FTS 功能已经成熟并且在查找方面相当快。值得一看。