T-SQL 中是否有 LIKE 语句的替代方案?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/26506879/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 02:51:46  来源:igfitidea点击:

Is there and alternative to LIKE statement in T-SQL?

sqlsql-servertsqlcontainssql-like

提问by Sachin Singh

I have a scenariowhere I need to perform following operation:

我有一个需要执行以下操作的场景

SELECT *
FROM
[dbo].[MyTable]
WHERE
[Url] LIKE '%<some url>%';

I have to use two % (wildcard characters) at the beginning and the end of Url ('%<some url>%') as user should be able to search the complete url even if he types partial text. For example, if url is http://www.google.co.inand user types "goo", then the url must appear in search results. LIKEoperator is causing performance issues. I need an alternativeso that I can get rid of this statement and wildcards. In other words, I don't want to use LIKE statement in this scenario. I tried using T-SQL CONTAINSbut it is not solving my problem. Is there any other alternative available than can perform pattern matching and provide me results quickly?

我必须在 Url ( '%<some url>%')的开头和结尾使用两个 % (通配符),因为即使他键入部分文本,用户也应该能够搜索完整的 url。例如,如果 url 是http://www.google.co.in并且用户键入“goo”,则该 url 必须出现在搜索结果中。LIKE操作员导致性能问题。我需要一个替代方案,以便我可以摆脱这个声明和通配符。换句话说,我不想在这种情况下使用 LIKE 语句。我尝试使用 T-SQLCONTAINS但它不能解决我的问题。除了可以执行模式匹配并快速提供结果之外,还有其他可用的替代方法吗?

回答by paparazzo

Starting a like with a % is going to cause a scan. No getting around it. It has to evaluate every value.

用 % 开始点赞会导致扫描。没有绕过它。它必须评估每个值。

If you index the column it should be an index (rather than table) scan.

如果对列进行索引,则它应该是索引(而不是表)扫描。

You don't have an alternative that will not cause a scan.
Charindex and patindex are alternatives but will still scan and not fix the performance issue.

您没有不会导致扫描的替代方案。
Charindex 和 patindex 是替代品,但仍会进行扫描并且不会修复性能问题。

Could you break the components out into a separate table?
www
google
co
in

你能把组件分解成一个单独的表吗?
WWW
谷歌
合作

And then search on like 'goo%'?
That would use an index as it does not start with %.

然后像'goo%'这样搜索?
这将使用索引,因为它不以 % 开头。

Better yet you could search on 'google' and get an index seek.

更好的是,您可以在“谷歌”上搜索并获得索引搜索。

And you would want to have the string unique in that table with a separate join on Int PK so it does not return multiple www for instance.

并且您希望在该表中具有唯一的字符串,并在 Int PK 上进行单独的连接,这样它就不会返回多个 www 。

Suspect FullText Contains was not faster because FullText kept the URL as one word.

Suspect FullText Contains 并没有更快,因为 FullText 将 URL 保留为一个词。

回答by DavidG

You could create a FULLTEXTindex.

您可以创建一个FULLTEXT索引。

First create your catalog:

首先创建您的目录:

CREATE FULLTEXT CATALOG ft AS DEFAULT;

Now assuming your table is called MyTable, the column is TextColumnand it has a unique index on it called UX_MyTable_TextColumn:

现在假设您的表被称为MyTable,该列是TextColumn并且它有一个名为 的唯一索引UX_MyTable_TextColumn

CREATE FULLTEXT INDEX ON [dbo].[MyTable](TextColumn) 
    KEY INDEX UX_MyTable_TextColumn

Now you can search the table using CONTAINS:

现在您可以使用 CONTAINS 搜索表:

SELECT *
FROM MyTable
WHERE CONTAINS(TextColumn, 'searchterm')

回答by JohnLBevan

To my knowledge there's no alternative to likeor contains(full text search feature) which would give better performance. What you can do is try to improve performance by optimising your query. To do that, you need to know a bit about your users & how they'll use your system. I suspect most people will enter a URL from the start of the address (i.e. without protocol), so you could do something like this:

据我所知,没有替代likecontains(全文搜索功能)可以提供更好的性能。您可以做的是尝试通过优化查询来提高性能。为此,您需要对您的用户以及他们将如何使用您的系统有所了解。我怀疑大多数人会从地址的开头输入一个 URL(即没有协议),所以你可以这样做:

declare @searchTerm nvarchar(128) = 'goo'
set @searchTerm = coalesce(replace(@searchTerm ,'''',''''''),'')
select @searchTerm

SELECT *
FROM [dbo].[MyTable]
WHERE [Url] LIKE 'http://' + @searchTerm + '%'
or [Url] LIKE 'https://' + @searchTerm + '%'
or [Url] LIKE 'http://www.' + @searchTerm + '%'
or [Url] LIKE 'https://www.' + @searchTerm + '%'
or [Url] LIKE '%' + @searchTerm + '%'
option (fast 1); --get back the first result asap; 

That then gives you some optimisation; i.e. if the url's http://www.google.comthe index on the url column can be used since http://www.goois at the start of the string. The option (fast 1)piece on the end's to ensure this benefit is seen; since the last URL like %searchTerm%can't make use of indexes, we'd rather return responses as soon as we can rather than wait for that slow part to complete. Have a think of other common usage patterns and ways around those.

然后给你一些优化; 即如果 url 的http://www.google.com可以使用 url 列上的索引,因为http://www.goo位于字符串的开头。在option (fast 1)上月底的一块,以确保这样做的好处被发现; 由于最后一个URL like %searchTerm%不能使用索引,我们宁愿尽快返回响应而不是等待那个缓慢的部分完成。想想其他常见的使用模式和方法。

回答by Thorsten Kettner

Your query is a very simple one and I see no reason for it to be slow. The dbms wil read record for record and compare strings. Usually it can even do this in parallel threads.

您的查询非常简单,我认为没有理由慢。dbms 将读取记录并比较字符串。通常它甚至可以在并行线程中执行此操作。

What do you think can be the reason for your statement being so slow? Are there billions of records in your table? Do your records contain so much data?

您认为您的陈述如此缓慢的原因是什么?您的表中有数十亿条记录吗?你的记录包含这么多数据吗?

Your best bet is not to care about the query, but about the database and your system. Others have already suggested an index on the url column, so rather than scanning the table, the index can be scanned. Is max degree of parallelism mistakenly set? Is your table fragmented? Is your hardware appropriate? These are the things to consider here.

最好的办法不是关心查询,而是关心数据库和您的系统。其他人已经在 url 列上建议了一个索引,因此可以扫描索引而不是扫描表。是否错误地设置了最大并行度?你的表是碎片化的吗?你的硬件合适吗?这些是这里要考虑的事情。

However: charindex('oogl', url) > 0does the same as url like '%oogl%', but internally they work differently somehow. For some people the LIKE expression turned out faster, for others the CHARINDEX method. Maybe it depends on the query, number of processors, operating system, whatever. It may be worth a try.

但是:charindex('oogl', url) > 0与 相同url like '%oogl%',但在内部它们以某种方式工作不同。对于某些人来说,LIKE 表达式的结果更快,而对于其他人来说,则是 CHARINDEX 方法。也许这取决于查询、处理器数量、操作系统等。这可能值得一试。

回答by Jeroen Mostert

As written, your query cannot be further optimized, and there is no way of getting around the LIKEto do your searching. The only thing you can do to improve performance is reduce the SELECTto return only the columns you need if you don't need all of them, and create an index on URLwith those columns included. The LIKEwill not be able to use the index for seeking, but the reduced data size for scanning can help. If you have a SQL Server edition that supports compression, that will help as well.

正如所写,您的查询无法进一步优化,并且无法绕过LIKE进行搜索。SELECT如果您不需要所有列,您唯一能做的就是减少只返回您需要的列,并在URL包含这些列的情况下创建索引。在LIKE将无法使用寻求的指数,但对于扫描可帮助减少数据大小。如果您有支持压缩的 SQL Server 版本,那也会有所帮助。

For instance, if you really need only column A, write

例如,如果你真的只需要 A 列,那么写

SELECT A FROM [dbo].[MyTable] WHERE [Url] LIKE '%<some url>%';

And create the index as

并将索引创建为

CREATE INDEX IX_MyTable_URL 
ON MyTable([Url]) 
INCLUDE (A) WITH (DATA_COMPRESSION = PAGE);

If A is already included in your primary key, the INCLUDE is unnecessary.

如果 A 已包含在主键中,则不需要 INCLUDE。