SQL 'like' vs '=' 性能
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6142235/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL 'like' vs '=' performance
提问by UnconditionallyReinstateMonica
This question skirts around what I'm wondering, but the answers don't exactly address it.
这个问题 绕过了我的疑惑,但答案并没有完全解决它。
It would seem that in general'=' is faster than 'like' when using wildcards. This appears to be the conventional wisdom. However, lets suppose I have a column containing a limited number of different fixed, hardcoded, varchar identifiers, and I want to select all rows matching one of them:
这似乎是在一般的“=”快于“喜欢”使用通配符时。这似乎是传统智慧。但是,假设我有一列包含有限数量的不同固定、硬编码、varchar 标识符,并且我想选择与其中一个匹配的所有行:
select * from table where value like 'abc%'
and
和
select * from table where value = 'abcdefghijklmn'
'Like' should only need to test the first three chars to find a match, whereas '=' must compare the entire string. In this case it would seem to me that 'like' would have an advantage, all other things being equal.
'Like' 应该只需要测试前三个字符来找到匹配,而 '=' 必须比较整个字符串。在这种情况下,在我看来,在所有其他条件相同的情况下,“喜欢”会占优势。
This is intended as a general, academic question, and so should not matter which DB, but it arose using SQL Server 2005.
这是一个一般性的学术问题,因此与哪个数据库无关,但它是使用 SQL Server 2005 产生的。
采纳答案by BonyT
Quote from there:
从那里引用:
the rules for index usage with LIKE are loosely like this:
If your filter criteria uses equals = and the field is indexed, then most likely it will use an INDEX/CLUSTERED INDEX SEEK
If your filter criteria uses LIKE, with no wildcards (like if you had a parameter in a web report that COULD have a % but you instead use the full string), it is about as likely as #1 to use the index. The increased cost is almost nothing.
If your filter criteria uses LIKE, but with a wildcard at the beginning (as in Name0 LIKE '%UTER') it's much less likely to use the index, but it still may at least perform an INDEX SCAN on a full or partial range of the index.
HOWEVER, if your filter criteria uses LIKE, but starts with a STRING FIRST and has wildcards somewhere AFTER that (as in Name0 LIKE 'COMP%ER'), then SQL may just use an INDEX SEEK to quickly find rows that have the same first starting characters, and then look through those rows for an exact match.
(Also keep in mind, the SQL engine still might not use an index the way you're expecting, depending on what else is going on in your query and what tables you're joining to. The SQL engine reserves the right to rewrite your query a little to get the data in a way that it thinks is most efficient and that may include an INDEX SCAN instead of an INDEX SEEK)
LIKE 的索引使用规则大致如下:
如果您的过滤条件使用 equals = 并且该字段已编入索引,那么它很可能会使用 INDEX/CLUSTERED INDEX SEEK
如果您的过滤条件使用 LIKE,没有通配符(例如,如果您在 Web 报告中有一个参数,该参数可以包含 % 但您使用完整字符串),则使用索引的可能性与 #1 一样。增加的成本几乎为零。
如果您的过滤条件使用 LIKE,但在开头使用通配符(如 Name0 LIKE '%UTER'),则使用索引的可能性要小得多,但它仍然可能至少对全部或部分范围执行 INDEX SCAN指数。
但是,如果您的过滤条件使用 LIKE,但以 STRING FIRST 开头,并且在此之后的某处有通配符(如 Name0 LIKE 'COMP%ER'),那么 SQL 可能只使用 INDEX SEEK 来快速查找首先具有相同内容的行开始字符,然后在这些行中查找完全匹配。
(还要记住,SQL 引擎可能仍然不会按照您期望的方式使用索引,这取决于您的查询中发生的其他事情以及您要加入的表。SQL 引擎保留重写您的索引的权利稍微查询一下,以它认为最有效的方式获取数据,并且可能包括 INDEX SCAN 而不是 INDEX SEEK)
回答by JNK
It's a measureable difference.
这是一个可衡量的差异。
Run the following:
运行以下命令:
Create Table #TempTester (id int, col1 varchar(20), value varchar(20))
go
INSERT INTO #TempTester (id, col1, value)
VALUES
(1, 'this is #1', 'abcdefghij')
GO
INSERT INTO #TempTester (id, col1, value)
VALUES
(2, 'this is #2', 'foob'),
(3, 'this is #3', 'abdefghic'),
(4, 'this is #4', 'other'),
(5, 'this is #5', 'zyx'),
(6, 'this is #6', 'zyx'),
(7, 'this is #7', 'zyx'),
(8, 'this is #8', 'klm'),
(9, 'this is #9', 'klm'),
(10, 'this is #10', 'zyx')
GO 10000
CREATE CLUSTERED INDEX ixId ON #TempTester(id)CREATE CLUSTERED INDEX ixId ON #TempTester(id)
CREATE NONCLUSTERED INDEX ixTesting ON #TempTester(value)
Then:
然后:
SET SHOWPLAN_XML ON
SET SHOWPLAN_XML ON
Then:
然后:
SELECT * FROM #TempTester WHERE value LIKE 'abc%'
SELECT * FROM #TempTester WHERE value = 'abcdefghij'
The resulting execution plan shows you that the cost of the first operation, the LIKE
comparison, is about 10 timesmore expensive than the =
comparison.
生成的执行计划显示,第一个操作(LIKE
比较)的成本大约比比较高10 倍=
。
If you can use an =
comparison, please do so.
如果您可以使用=
比较,请这样做。
回答by Blindy
You should also keep in mind that when using like
, some sql flavors will ignore indexes, and that will kill performance. This is especially true if you don't use the "starts with" pattern like your example.
您还应该记住,在使用时like
,某些 sql 风格会忽略索引,这会降低性能。如果您不像示例那样使用“开头为”模式,则尤其如此。
You should really look at the execution plan for the query and see what it's doing, guess as little as possible.
您应该真正查看查询的执行计划并查看它在做什么,尽可能少地猜测。
This being said, the "starts with" pattern can and is optimized in sql server. It willuse the table index. EF 4.0 switched to like
for StartsWith
for this very reason.
话虽如此,“开始于”模式可以并且在 sql server 中进行了优化。它会使用表的索引。出于这个原因,EF 4.0 切换到like
了StartsWith
。
回答by Will A
If value
is unindexed, both result in a table-scan. The performance difference in this scenario will be negligible.
如果value
未编入索引,则两者都会导致表扫描。这种情况下的性能差异可以忽略不计。
If value
is indexed, as Daniel points out in his comment, the =
will result in an index lookup which is O(log N) performance. The LIKE will (most likely - depending on how selective it is) result in a partial scan of the index >= 'abc'
and < 'abd'
which will require more effort than the =
.
如果value
被索引,正如丹尼尔在他的评论中指出的那样,=
将导致索引查找的性能为 O(log N)。该LIKE会(最有可能的-这取决于它是如何选择性的)结果在指数的部分扫描>= 'abc'
和< 'abd'
这将需要比付出更多的努力=
。
Note that I'm talking SQL Server here - not all DBMSs will be nice with LIKE.
请注意,我在这里谈论的是 SQL Server - 并非所有 DBMS 都适合使用 LIKE。
回答by Remus Rusanu
You are asking the wrong question. In databases is not the operator performance that matters, is always the SARGabilityof the expression, and the coverabilityof the overall query. Performance of the operator itself is largely irrelevant.
你问错了问题。在数据库中,重要的不是运算符的性能,而是表达式的SARGability和整个查询的可覆盖性。运营商本身的表现在很大程度上是无关紧要的。
So, how do LIKE
and =
compare in terms of SARGability? LIKE
, when used with an expression that does not start with a constant (eg. when used LIKE '%something'
) is by definition non-SARGabale. But does that make =
or LIKE 'something%'
SARGable? No. As with any question about SQL performance the answer does not lie with the query of the text, but with the schema deployed. These expression maybe SARGable ifan index exists to satisfy them.
那么,在SARGability方面如何做LIKE
和=
比较?LIKE
,当与不以常量开头的表达式一起使用时(例如,当使用时LIKE '%something'
)根据定义是非 SARGabale。但这是否使=
或LIKE 'something%'
SARGable?不。与任何有关 SQL 性能的问题一样,答案不在于文本查询,而在于部署的架构。如果存在满足它们的索引,则这些表达式可能是 SARGable 。
So, truth be told, there are small differences between =
and LIKE
. But asking whether one operator or other operator is 'faster' in SQL is like asking 'What goes faster, a red car or a blue car?'. You should eb asking questions about the engine size and vechicle weight, not about the color... To approach questions about optimizing relational tables, the place to look is your indexesand your expressionsin the WHERE clause (and other clauses, but it usually starts with the WHERE).
因此,说实话,=
和之间存在细微差别LIKE
。但是在 SQL 中询问一个操作符或其他操作符是否“更快”就像问“什么跑得更快,一辆红色汽车还是一辆蓝色汽车?”。您应该询问有关引擎大小和车辆重量的问题,而不是关于颜色的问题...要解决有关优化关系表的问题,请查看您的索引和WHERE 子句中的表达式(和其他子句,但通常以 WHERE 开头)。
回答by Aris
A personal example using mysql 5.5: I had an inner join between 2 tables, one of 3 million rows and one of 10 thousand rows.
使用 mysql 5.5 的个人示例:我在 2 个表之间进行了内部联接,其中一个是 300 万行,另一个是 1 万行。
When using a like on an index as below(no wildcards), it took about 30 seconds:
在索引上使用类似如下(无通配符)时,大约需要 30 秒:
where login like '12345678'
using 'explain' I get:
使用“解释”我得到:
When using an '=' on the same query, it took about 0.1 seconds:
在同一个查询上使用 '=' 时,大约需要 0.1 秒:
where login ='600009'
Using 'explain' I get:
使用“解释”我得到:
As you can see, the like
completely cancelled the index seek, so query took 300 times more time.
如您所见,like
完全取消了索引查找,因此查询花费了 300 倍以上的时间。
回答by Aris
Maybe you are looking about Full Text Search.
也许您正在寻找Full Text Search。
In contrast to full-text search, the LIKE Transact-SQL predicate works on character patterns only. Also, you cannot use the LIKE predicate to query formatted binary data. Furthermore, a LIKE query against a large amount of unstructured text data is much slower than an equivalent full-text query against the same data. A LIKE query against millions of rows of text data can take minutes to return; whereas a full-text query can take only seconds or less against the same data, depending on the number of rows that are returned.
与全文搜索相反,LIKE Transact-SQL 谓词仅适用于字符模式。此外,您不能使用 LIKE 谓词查询格式化的二进制数据。此外,针对大量非结构化文本数据的 LIKE 查询比针对相同数据的等效全文查询慢得多。对数百万行文本数据的 LIKE 查询可能需要几分钟才能返回;而对相同数据的全文查询可能只需要几秒钟或更短的时间,具体取决于返回的行数。
回答by user5190021
First things first ,
第一件事,
they are not always equal
他们并不总是平等的
select 'Hello' from dual where 'Hello ' like 'Hello';
select 'Hello' from dual where 'Hello ' = 'Hello';
when things are not always equal , talking about their performance isn't that relevant.
当事情并不总是平等的时候,谈论他们的表现就没有那么重要了。
If you are working on strings and only char variables , then you can talk about performance . But don't use like and "=" as being generally interchangeable .
如果您正在处理字符串并且只处理字符变量,那么您可以谈论性能。但是不要使用 like 和 "=" 作为通常可以互换的。
As you would have seen in many posts ( above and other questions) , in cases when they are equal the performance of like is slower owing to pattern matching (collation)
正如您在许多帖子(上面和其他问题)中看到的那样,在它们相等的情况下,由于模式匹配(排序规则),like 的性能会变慢