MySQL 使用 InnoDB 进行全文搜索
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1381186/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fulltext Search with InnoDB
提问by brianreavis
I'm developing a high-volume web application, where part of it is a MySQL database of discussion posts that will need to grow to 20M+ rows, smoothly.
我正在开发一个大容量的 Web 应用程序,其中一部分是讨论帖子的 MySQL 数据库,需要顺利增长到 2000 万行以上。
I was originally planning on using MyISAM for the tables (for the built-in fulltext search capabilities), but the thought of the entire tablebeing locked due to a single write operation makes me shutter. Row-level locks make so much more sense (not to mention InnoDB's other speed advantages when dealing with huge tables). So, for this reason, I'm pretty determined to use InnoDB.
我最初计划将 MyISAM 用于表(用于内置全文搜索功能),但想到由于单个写入操作而锁定整个表,这让我感到沮丧。行级锁更有意义(更不用说 InnoDB 在处理大表时的其他速度优势)。因此,出于这个原因,我非常确定要使用 InnoDB。
The problem is... InnoDB doesn't have built-in fulltext search capabilities.
问题是... InnoDB 没有内置的全文搜索功能。
Should I go with a third-party search system? Like Lucene(c++)/ Sphinx? Do any of you database ninjas have any suggestions/guidance? LinkedIn's zoie(based off Lucene) looks like the best option at the moment... having been built around realtime capabilities (which is pretty critical for my application.) I'm a little hesitant to commit yet without some insight...
我应该使用第三方搜索系统吗?像Lucene(c++)/ Sphinx吗?你们中的任何一个数据库忍者有什么建议/指导吗?LinkedIn 的zoie(基于 Lucene)目前看起来是最好的选择...围绕实时功能构建(这对我的应用程序非常重要。)我有点犹豫要不要提交但没有一些洞察力...
(FYI: going to be on EC2 with high-memory rigs, using PHP to serve the frontend)
(仅供参考:将在 EC2 上使用高内存设备,使用 PHP 为前端提供服务)
采纳答案by Ian Wilkes
I can vouch for MyISAM fulltext being a bad option - even leaving aside the various problems with MyISAM tables in general, I've seen the fulltext stuff go off the rails and start corrupting itself and crashing MySQL regularly.
我可以保证 MyISAM 全文是一个糟糕的选择 - 即使不考虑 MyISAM 表的各种问题,我也看到全文内容偏离轨道并开始自我破坏并定期使 MySQL 崩溃。
A dedicated search engine is definitely going to be the most flexible option here - store the post data in MySQL/innodb, and then export the text to your search engine. You can set up a periodic full index build/publish pretty easily, and add real-time index updates if you feel the need and want to spend the time.
专用的搜索引擎绝对是这里最灵活的选择 - 将帖子数据存储在 MySQL/innodb 中,然后将文本导出到您的搜索引擎。您可以非常轻松地设置定期完整索引构建/发布,并在您觉得需要并愿意花时间时添加实时索引更新。
Lucene and Sphinx are good options, as is Xapian, which is nice and lightweight. If you go the Lucene route don't assume that Clucene will better, even if you'd prefer not to wrestle with Java, although I'm not really qualified to discuss the pros and cons of either.
Lucene 和 Sphinx 是不错的选择,Xapian也是不错的选择,它既美观又轻巧。如果您走 Lucene 路线,请不要假设 Clucene 会更好,即使您不想与 Java 搏斗,尽管我真的没有资格讨论两者的优缺点。
回答by Jeremy Smyth
Along with the general phasing out of MyISAM, InnoDB full-text search (FTS) is finally available in MySQL 5.6.4 release.
随着 MyISAM 的全面淘汰,InnoDB 全文搜索 (FTS) 终于在 MySQL 5.6.4 版本中可用。
Lots of juicy details at https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html.
https://dev.mysql.com/doc/refman/5.6/en/innodb-fulltext-index.html 上有很多有趣的细节。
While other engines have lots of different features, this one is InnoDB, so it's native (which means there's an upgrade path), and that makes it a worthwhile option.
虽然其他引擎有很多不同的特性,但这个是 InnoDB,所以它是原生的(这意味着有一个升级路径),这使它成为一个值得的选择。
回答by Bill Karwin
You should spend an hour and go through installation and test-drive of Sphinx and Lucene. See if either meets your needs, with respect to data updates.
您应该花一个小时完成 Sphinx 和 Lucene 的安装和试驾。看看两者是否满足您的数据更新需求。
One of the things that disappointed me about Sphinx is that it doesn't support incremental inserts very well. That is, it's very expensive to reindex after an insert, so expensive that their recommended solution is to split your data into older, unchanging rows and newer, volatile rows. So every search your app does would have to search twice: once on the larger index for old rows and also on the smaller index for recent rows. If that doesn't integrate with your usage patterns, this Sphinx is not a good solution (at least not in its current implementation).
Sphinx 让我失望的一件事是它不能很好地支持增量插入。也就是说,在插入后重新索引非常昂贵,以至于他们推荐的解决方案是将数据拆分为旧的、不变的行和新的、易变的行。因此,您的应用程序执行的每次搜索都必须搜索两次:一次在较大索引上搜索旧行,另一次在较小索引上搜索最近行。如果这不符合您的使用模式,则此 Sphinx 不是一个好的解决方案(至少在其当前实现中不是)。
I'd like to point out another possible solution you could consider: Google Custom Search. If you can apply some SEO to your web application, then outsource the indexing and search function to Google, and embed a Google search textfield into your site. It could be the most economical and scalable way to make your site searchable.
我想指出您可以考虑的另一种可能的解决方案: Google 自定义搜索。如果您可以对您的 Web 应用程序应用一些 SEO,那么将索引和搜索功能外包给 Google,并将 Google 搜索文本字段嵌入您的网站。这可能是使您的网站可搜索的最经济和可扩展的方式。
回答by bobobobo
Perhaps you shouldn't dismiss MySQL's FT so quickly. Craigslist used to use it.
也许你不应该这么快就放弃 MySQL 的 FT。 Craigslist 曾经使用过它。
MySQL's speed and Full Text Search has enabled craigslist to serve their users .. craigslist uses MySQL to serve approximately 50 million searches per month at a rate of up to 60 searches per second."
MySQL 的速度和全文搜索使 craigslist 能够为其用户提供服务.. craigslist 使用 MySQL 以高达每秒 60 次搜索的速度提供每月大约 5000 万次搜索服务。”
edit
编辑
As commented below, Craigslist seems to have switched to Sphinxsome time in early 2009.
正如下面评论的那样,Craigslist 似乎在 2009 年初的某个时候转向了 Sphinx。
回答by Gregg Lind
Sphinx, as you point out, is quite nice for this stuff. All the work is in the configuration file. Make sure whatever your table is with the strings has some unique integer id key, and you should be fine.
正如您所指出的,Sphinx 非常适合这些东西。所有的工作都在配置文件中。确保无论你的表格是什么,字符串都有一些唯一的整数 id 键,你应该没问题。
回答by Muhammad
You should take a look at Sphinx. It is worth a try. It's indexing is super fast and it is distributed. You should take a look at this (http://www.percona.com/webinars/2012-08-22-full-text-search-throwdown) webminar. It talks about searching and has some neat benchmarks. You may find it helpful.
你应该看看Sphinx。值得一试。它的索引速度非常快,而且是分布式的。您应该看看这个 (http://www.percona.com/webinars/2012-08-22-full-text-search-throwdown) 网络研讨会。它谈论搜索并有一些简洁的基准。您可能会发现它很有帮助。
回答by user1612250
If everything else fails, there's always soundex_match, which sadly isn't really fast an accurate
如果其他一切都失败了,那么总是有soundex_match,遗憾的是它不是真的快速准确
回答by Rakesh Ojha
try this
尝试这个
ROUND((LENGTH(text) - LENGTH(REPLACE(text, 'serchtext', ''))) / LENGTH('serchtext'),0)!=0