SQL 什么时候应该使用全文索引?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/57918/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 23:20:27  来源:igfitidea点击:

When should you use full-text indexing?

sqlsql-servertsqlindexingfull-text-search

提问by Esteban Araya

We have a whole bunch of queries that "search" for clients, customers, etc. You can search by first name, email, etc. We're using LIKE statements in the following manner:

我们有一大堆“搜索”客户、顾客等的查询。您可以按名字、电子邮件等进行搜索。我们按以下方式使用 LIKE 语句:

SELECT * 
FROM customer 
WHERE fname LIKE '%someName%'

Does full-text indexing help in the scenario? We're using SQL Server 2005.

全文索引在这种情况下有帮助吗?我们使用的是 SQL Server 2005。

采纳答案by Prestaul

It will depend upon your DBMS. I believe that most systems will not take advantage of the full-text index unless you use the full-text functions. (e.g. MATCH/AGAINSTin mySQL or FREETEXT/CONTAINS in MS SQL)

这将取决于您的 DBMS。我相信除非你使用全文功能,否则大多数系统不会利用全文索引。(例如,在 mySQL 中的MATCH/AGAINST或在 MS SQL 中的 FREETEXT/CONTAINS)

Here is a good article on when, why, and how to use full-text indexing in SQL Server: Understanding SQL Server Full-Text Indexing

这是一篇关于何时、为什么以及如何在 SQL Server 中使用全文索引的好文章:了解 SQL Server 全文索引

回答by Josef

FTS canhelp in this scenario, the question is whether it is worth it or not.

FTS在这种情况下可以提供帮助,问题是它是否值得。

To begin with, let's look at why LIKEmay not be the most effective search. When you use LIKE, especially when you are searching with a %at the beginning of your comparison, SQL Server needs to perform both a table scan of every single row anda byte by byte check of the column you are checking.

首先,让我们看看为什么LIKE可能不是最有效的搜索。当您使用 时LIKE,尤其是当您%在比较开始时使用 a 进行搜索时,SQL Server 需要对每一行执行表扫描,并对要检查的列执行逐字节检查。

FTS has some better algorithms for matching data as does some better statistics on variations of names. Therefore FTS can provide better performance for matching Smith, Smythe, Smithers, etc when you look for Smith.

FTS 有一些更好的算法来匹配数据,就像一些关于名称变体的更好的统计数据一样。因此,当您寻找 Smith 时,FTS 可以为匹配 Smith、Smythe、Smithers 等提供更好的性能。

It is, however, a bit more complex to use FTS, as you'll need to master CONTAINSvs FREETEXTand the arcane format of the search. However, if you want to do a search where either FName or LName match, you can do that with one statement instead of an OR.

然而,使用 FTS 有点复杂,因为您需要掌握CONTAINSvsFREETEXT和搜索的神秘格式。但是,如果您想在 FName 或 LName 匹配的情况下进行搜索,您可以使用一个语句而不是 OR 来完成。

To determine if FTS is going to be effective, determine how much data you have. I use FTS on a database of several hundred million rows and that's a real benefit over searching with LIKE, but I don't use it on every table.

要确定 FTS 是否有效,请确定您拥有多少数据。我在数亿行的数据库上使用 FTS,这比使用 搜索确实有好处LIKE,但我不会在每个表上都使用它。

If your table size is more reasonable, less than a few million, you can get similar speed by creating an index for each column that you're going to be searching on and SQL Server should perform an index scan rather than a table scan.

如果您的表大小更合理,小于几百万,您可以通过为要搜索的每一列创建索引来获得类似的速度,并且 SQL Server 应该执行索引扫描而不是表扫描。

回答by Strinder

According to my test scenario:

根据我的测试场景:

  • SQL Server 2008
  • 10.000.000 rows each with a string like "wordA wordB wordC..." (varies between 1 and 30 words)
  • selecting count(*) with CONTAINS(column, "wordB")
  • result size several hundred thousands
  • catalog size approx 1.8GB
  • SQL Server 2008
  • 10.000.000 行,每行都有一个像“wordA wordB wordC ...”这样的字符串(在 1 到 30 个单词之间变化)
  • 选择 count(*) 和 CONTAINS(column, "wordB")
  • 结果大小数十万
  • 目录大小约 1.8GB

Full-text index was in range of 2s whereas like '% wordB %'was in range of 1-2 minutes.

全文索引在 2s 的范围内,而像 '% wordB %'在 1-2 分钟的范围内。

But this counts only if you don't use any additional selection criteria!E.g. if I used some "like 'prefix%'"on a primary key column additionally, the performance was worse since the operation of going into the full-text index costs more than doing a string search in some fields (as long those are not too much).

但这仅在您不使用任何其他选择标准时才有意义!例如,如果我在主键列上额外使用了一些“like 'prefix%'”,性能会更差,因为进入全文索引的操作比在某些字段中进行字符串搜索的成本更高(只要那些不是太多了)。

So I would recommend full-text index onlyin cases where you have to do a "free string search" or use some of the special features of it...

因此,在您必须进行“免费字符串搜索”或使用它的某些特殊功能的情况下,我才推荐全文索引...

回答by Brannon

To answer the question specifically for MSSQL, full-text indexing will NOThelp in your scenario.

要专门针对 MSSQL 回答问题,全文索引对您的方案没有帮助。

In order to improve that query you could do one of the following:

为了改进该查询,您可以执行以下操作之一:

  1. Configure a full-text catalog on the column and use the CONTAINS() function.
  2. If you were primarily searching with a prefix (i.e. matching from the start of the name), you could change the predicate to the following and create an index over the column.

    where fname like 'prefix%'

  1. 在列上配置全文目录并使用 CONTAINS() 函数。
  2. 如果您主要使用前缀进行搜索(即从名称的开头匹配),您可以将谓词更改为以下内容并在该列上创建索引。

    其中 fname 像 'prefix%'

(1) is probably overkill for this, unless the performance of the query is a big problem.

(1) 对此可能有点矫枉过正,除非查询的性能是一个大问题。