SQL Server 索引 - LIKE 查询的任何改进?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/803783/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 01:53:31  来源:igfitidea点击:

SQL Server Index - Any improvement for LIKE queries?

sqlsql-serverselect

提问by schooner

We have a query that runs off a fairly large table that unfortunately needs to use LIKE '%ABC%' on a couple varchar fields so the user can search on partial names, etc. SQL Server 2005

我们有一个查询,它运行在一个相当大的表上,不幸的是需要在几个 varchar 字段上使用 LIKE '%ABC%',以便用户可以搜索部分名称等。 SQL Server 2005

Would adding an index on these varchar fields help any in terms of select query performance when using LIKE or does it basically ignore the indexes and do a full scan in those cases?

在使用 LIKE 时,在这些 varchar 字段上添加索引是否有助于选择查询性能,或者它是否基本上忽略索引并在这些情况下进行完整扫描?

Any other possible ways to improve performance when using LIKE?

使用 LIKE 时还有其他可能提高性能的方法吗?

采纳答案by Lasse V. Karlsen

Only if you add full-text searching to those columns, and use the full-text query capabilities of SQL Server.

仅当您向这些列添加全文搜索并使用 SQL Server 的全文查询功能时。

Otherwise, no, an index will not help.

否则,不,索引将无济于事。

回答by ahains

You can potentially see performance improvements by adding index(es), it depends a lot on the specifics :)

您可以通过添加索引来潜在地看到性能改进,这在很大程度上取决于具体情况:)

How much of the total size of the row are your predicated columns? How many rows do you expect to match? Do you need to return all rows that match the predicate, or just top 1 or top n rows?

您的谓词列占行总大小的多少?您希望匹配多少行?您是否需要返回与谓词匹配的所有行,或者只返回前 1 行或前 n 行?

If you are searching for values with high selectivity/uniqueness (so few rows to return), and the predicated columns are a smallish portion of the entire row size, an index could be quite useful. It will still be a scan, but your index will fit more rows per page than the source table.

如果您正在搜索具有高选择性/唯一性的值(返回的行很少),并且谓词列是整个行大小的一小部分,则索引可能非常有用。它仍将是一次扫描,但您的索引每页将比源表容纳更多行。

Here is an example where the total row size is much greater than the column size to search across:

这是一个示例,其中总行大小远大于要搜索的列大小:

create table t1 (v1 varchar(100), b1 varbinary(8000))
go
--add 10k rows of filler
insert t1 values ('abc123def', cast(replicate('a', 8000) as varbinary(8000)))
go 10000
--add 1 row to find
insert t1 values ('abc456def', cast(replicate('a', 8000) as varbinary(8000)))
go

set statistics io on 
go
select * from t1 where v1 like '%456%'
--shows 10001 logical reads

--create index that only contains the column(s) to search across
create index t1i1 on t1(v1)
go
select * from t1 where v1 like '%456%'
--or can force to 
--shows 37 logical reads

If you look at the actual execution plan you can see the engine scanned the index and did a bookmark lookup on the matching row. Or you can tell the optimizer directly to use the index, if it hadn't decide to use this plan on its own: select * from t1 with (index(t1i1)) where v1 like '%456%'

如果您查看实际的执行计划,您可以看到引擎扫描了索引并在匹配行上进行了书签查找。或者你可以直接告诉优化器使用索引,如果它没有决定自己使用这个计划: select * from t1 with (index(t1i1)) where v1 like '%456%'

If you have a bunch of columns to search across only a few that are highly selective, you could create multiple indexes and use a reduction approach. E.g. first determine a set of IDs (or whatever your PK is) from your highly selective index, then search your less selective columns with a filter against that small set of PKs.

如果您有一堆列只搜索少数几个具有高度选择性的列,您可以创建多个索引并使用缩减方法。例如,首先从您的高选择性索引中确定一组 ID(或您的 PK 是什么),然后使用过滤器针对该小组 PK 搜索您选择较少的列。

If you always need to return a large set of rows you would almost certainly be better off with a table scan.

如果你总是需要返回大量的行,你几乎肯定会更好地进行表扫描。

So the possible optimizations depend a lot on the specifics of your table definition and the selectivity of your data.

因此,可能的优化很大程度上取决于表定义的细节和数据的选择性。

HTH! -Adrian

哼!-阿德里安

回答by marc_s

The only other way (other than using fulltext indexing) you could improve performance is to use "LIKE ABC%" - don't add the wildcard on both ends of your search term - in that case, an index could work.

可以提高性能的唯一其他方法(除了使用全文索引)是使用“LIKE ABC%” - 不要在搜索词的两端添加通配符 - 在这种情况下,索引可以工作。

If your requirements are such that you have to have wildcards on both ends of your search term, you're out of luck...

如果您的要求是您必须在搜索词的两端都使用通配符,那么您就不走运了...

Marc

马克

回答by Cruachan

Like '%ABC%' will always perform a full table scan. There is no way around that.

像 '%ABC%' 将始终执行全表扫描。没有办法解决这个问题。

You do have a couple of alternative approaches. Firstly full text searching, it's really designed for this sort of problem so I'd look at that first.

您确实有几种替代方法。首先是全文搜索,它真的是为这类问题设计的,所以我会先看一下。

Alternatively in some circumstances it might be appropriate to denormalize the data and pre-process the target fields into appropriate tokens, then add these possible search terms into a separate one to many search table. For example, if my data always consisted of a field containing the pattern 'AAA/BBB/CCC' and my users were searching on BBB then I'd tokenize that out at insert/update (and remove on delete). This would also be one of those cases where using triggers, rather than application code, would be muchpreferred.

或者,在某些情况下,可能适合对数据进行非规范化并将目标字段预处理为适当的标记,然后将这些可能的搜索词添加到单独的一对多搜索表中。例如,如果我的数据始终由包含模式“AAA/BBB/CCC”的字段组成,并且我的用户在 BBB 上进行搜索,那么我会在插入/更新时将其标记化(并在删除时删除)。这也是使用触发器而不是应用程序代码受欢迎的情况之一。

I must emphasis that this is not really an optimal technique and should only be used if the data is a good match for the approach and for some reason you do not want to use full text search (and the database performance on the like scan really is unacceptable). It's also likely to produce maintenance headaches further down the line.

我必须强调,这并不是真正的最佳技术,只有在数据与该方法非常匹配并且出于某种原因您不想使用全文搜索时才应该使用它(并且类似扫描的数据库性能确实是不可接受)。它还可能会导致后续的维护问题。

回答by Mladen Prajdic

create statistics on that column. sql srever 2005 has optimized the in string search so you might benfit from that.

在该列上创建统计信息。sql srever 2005 优化了字符串内搜索,因此您可能会从中受益。