SQL 等于 (=) 与 LIKE

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/543580/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 01:07:18  来源:igfitidea点击:

Equals(=) vs. LIKE

sqlperformanceequalssql-like

提问by Travis

When using SQL, are there any benefits of using =in a WHEREclause instead of LIKE?

使用 SQL 时,=WHERE子句中使用而不是使用有什么好处LIKE吗?

Without any special operators, LIKEand =are the same, right?

没有任何特殊的运营商,LIKE并且=是相同的,对不对?

采纳答案by Mark E. Haase

Different Operators

不同的运营商

LIKEand =are different operators. Most answers here focus on the wildcard support, which is not the only difference between these operators!

LIKE并且=是不同的运营商。这里的大多数答案都集中在通配符支持上,这并不是这些运算符之间的唯一区别!

=is a comparison operator that operates on numbers and strings. When comparing strings, the comparison operator compares whole strings.

=是对数字和字符串进行操作的比较运算符。比较字符串时,比较运算符比较整个字符串

LIKEis a string operator that compares character by character.

LIKE是一个字符串运算符,用于逐个字符进行比较。

To complicate matters, both operators use a collationwhich can have important effects on the result of the comparison.

更复杂的是,两个运算符都使用了一种排序规则,这会对比较结果产生重要影响。

Motivating Example

励志例子

Let's first identify an example where these operators produce obviously different results. Allow me to quote from the MySQL manual:

让我们首先确定一个示例,其中这些运算符产生明显不同的结果。请允许我引用 MySQL 手册中的内容:

Per the SQL standard, LIKE performs matching on a per-character basis, thus it can produce results different from the = comparison operator:

根据 SQL 标准,LIKE 在每个字符的基础上执行匹配,因此它可以产生与 = 比较运算符不同的结果:

mysql> SELECT '?' LIKE 'ae' COLLATE latin1_german2_ci;
+-----------------------------------------+
| '?' LIKE 'ae' COLLATE latin1_german2_ci |
+-----------------------------------------+
|                                       0 |
+-----------------------------------------+
mysql> SELECT '?' = 'ae' COLLATE latin1_german2_ci;
+--------------------------------------+
| '?' = 'ae' COLLATE latin1_german2_ci |
+--------------------------------------+
|                                    1 |
+--------------------------------------+

Please note that this page of the MySQL manual is called String Comparison Functions, and =is not discussed, which implies that =is not strictly a string comparison function.

请注意,MySQL 手册的这一页称为字符串比较函数=未进行讨论,这意味着它=不是严格意义上的字符串比较函数。

How Does =Work?

如何=工作?

The SQL Standard § 8.2describes how =compares strings:

SQL标准§8.2描述了如何=比较字符串:

The comparison of two character strings is determined as follows:

a) If the length in characters of X is not equal to the length in characters of Y, then the shorter string is effectively replaced, for the purposes of comparison, with a copy of itself that has been extended to the length of the longer string by concatenation on the right of one or more pad characters, where the pad character is chosen based on CS. If CS has the NO PAD attribute, then the pad character is an implementation-dependent character different from any character in the character set of X and Y that collates less than any string under CS. Otherwise, the pad character is a .

b) The result of the comparison of X and Y is given by the collating sequence CS.

c) Depending on the collating sequence, two strings may compare as equal even if they are of different lengths or contain different sequences of characters. When the operations MAX, MIN, DISTINCT, references to a grouping column, and the UNION, EXCEPT, and INTERSECT operators refer to character strings, the specific value selected by these operations from a set of such equal values is implementation-dependent.

两个字符串的比较确定如下:

a) 如果 X 的字符长度不等于 Y 的字符长度,那么为了比较的目的,较短的字符串被有效地替换为已扩展到较长字符串长度的自身副本通过在一个或多个填充字符的右侧串联,其中填充字符是根据 CS 选择的。如果 CS 具有 NO PAD 属性,则填充字符是一个与实现相关的字符,不同于 X 和 Y 字符集中的任何字符,这些字符的排序比 CS 下的任何字符串都少。否则,填充字符是 .

b) X 和 Y 的比较结果由整理序列 CS 给出。

c) 根据整理顺序,即使两个字符串的长度不同或包含不同的字符序列,它们也可能相等。当操作 MAX、MIN、DISTINCT、对分组列的引用以及 UNION、EXCEPT 和 INTERSECT 运算符引用字符串时,这些操作从一组此类等值中选择的特定值取决于实现。

(Emphasis added.)

(加了重点。)

What does this mean? It means that when comparing strings, the =operator is just a thin wrapper around the current collation. A collation is a library that has various rules for comparing strings. Here's an example of a binary collation from MySQL:

这是什么意思?这意味着在比较字符串时,=运算符只是当前排序规则的一个薄包装器。排序规则是一个库,它具有用于比较字符串的各种规则。这是来自 MySQL 的二进制排序规则的示例:

static int my_strnncoll_binary(const CHARSET_INFO *cs __attribute__((unused)),
                               const uchar *s, size_t slen,
                               const uchar *t, size_t tlen,
                               my_bool t_is_prefix)
{
  size_t len= MY_MIN(slen,tlen);
  int cmp= memcmp(s,t,len);
  return cmp ? cmp : (int)((t_is_prefix ? len : slen) - tlen);
}

This particular collation happens to compare byte-by-byte (which is why it's called "binary" — it doesn't give any special meaning to strings). Other collations may provide more advanced comparisons.

这个特定的排序规则恰好是逐字节比较的(这就是为什么它被称为“二进制”——它没有给字符串任何特殊的含义)。其他排序规则可能会提供更高级的比较。

For example, here is a UTF-8 collationthat supports case-insensitive comparisons. The code is too long to paste here, but go to that link and read the body of my_strnncollsp_utf8mb4(). This collation can process multiple bytes at a time and it can apply various transforms (such as case insensitive comparison). The =operator is completely abstracted from the vagaries of the collation.

例如,这里有一个支持不区分大小写比较的UTF-8 归类。代码太长,无法粘贴在这里,但请转到该链接并阅读my_strnncollsp_utf8mb4(). 此整理可以一次处理多个字节,并且可以应用各种转换(例如不区分大小写的比较)。该=运营商完全从整理变幻莫测抽象。

How Does LIKEWork?

如何LIKE工作?

The SQL Standard § 8.5describes how LIKEcompares strings:

SQL标准§8.5描述了如何LIKE比较字符串:

The <predicate>

M LIKE P

is true if there exists a partitioning of M into substrings such that:

i) A substring of M is a sequence of 0 or more contiguous <character representation>s of M and each <character representation> of M is part of exactly one substring.

ii) If the i-th substring specifier of P is an arbitrary character specifier, the i-th substring of M is any single <character representation>.

iii) If the i-th substring specifier of P is an arbitrary string specifier, then the i-th substring of M is any sequence of 0 or more <character representation>s.

iv) If the i-th substring specifier of P is neither an arbitrary character specifier nor an arbitrary string specifier, then the i-th substring of M is equal to that substring specifier according to the collating sequence of the <like predicate>, without the appending of <space> characters to M, and has the same length as that substring specifier.

v) The number of substrings of M is equal to the number of substring specifiers of P.

<谓词>

M LIKE P

如果存在将 M 划分为子串使得:

i) M 的子串是 M 的 0 个或多个连续 <字符表示> 的序列,并且 M 的每个 <字符表示> 恰好是一个子串的一部分。

ii) 如果 P 的第 i 个子串说明符是任意字符说明符,则 M 的第 i 个子串是任何单个<字符表示>。

iii) 如果 P 的第 i 个子串说明符是任意字符串说明符,则 M 的第 i 个子串是 0 个或多个 <字符表示> s 的任何序列。

iv) 如果 P 的第 i 个子串说明符既不是任意字符说明符也不是任意字符串说明符,则 M 的第 i 个子串根据 <like predicate> 的整理顺序等于该子串说明符,不将 <space> 字符附加到 M,并具有与该子字符串说明符相同的长度。

v) M 的子串数等于 P 的子串说明符数。

(Emphasis added.)

(加了重点。)

This is pretty wordy, so let's break it down. Items ii and iii refer to the wildcards _and %, respectively. If Pdoes not contain any wildcards, then only item iv applies. This is the case of interest posed by the OP.

这很罗嗦,让我们分解一下。项 ii 和 iii 分别指通配符_%。如果P不包含任何通配符,则仅适用第 iv 项。这是 OP 引起的兴趣的情况。

In this case, it compares each "substring" (individual characters) in Magainst each substring in Pusing the current collation.

在这种情况下,它使用当前排序规则将每个“子字符串”(单个字符)M与每个子字符串P进行比较。

Conclusions

结论

The bottom line is that when comparing strings, =compares the entire string while LIKEcompares one character at a time. Both comparisons use the current collation. This difference leads to different results in some cases, as evidenced in the first example in this post.

底线是比较字符串时,=比较整个字符串,而一次LIKE比较一个字符。两种比较都使用当前的排序规则。这种差异在某些情况下会导致不同的结果,如本文第一个示例所示。

Which one should you use? Nobody can tell you that — you need to use the one that's correct for your use case. Don't prematurely optimize by switching comparison operators.

你应该使用哪一个?没有人可以告诉你——你需要使用适合你用例的那个。不要通过切换比较运算符来过早地优化。

回答by achinda99

The equals (=) operator is a "comparison operator compares two values for equality." In other words, in an SQL statement, it won't return true unless both sides of the equation are equal. For example:

等于 (=) 运算符是“比较运算符比较两个值是否相等”。换句话说,在 SQL 语句中,除非等式两边相等,否则它不会返回 true。例如:

SELECT * FROM Store WHERE Quantity = 200;

The LIKE operator "implements a pattern match comparison" that attempts to match "a string value against a pattern string containing wild-card characters." For example:

LIKE 运算符“实现模式匹配比较”,尝试将“字符串值与包含通配符的模式字符串”进行匹配。例如:

SELECT * FROM Employees WHERE Name LIKE 'Chris%';

LIKE is generally used only with strings and equals (I believe) is faster. The equals operator treats wild-card characters as literal characters. The difference in results returned are as follows:

LIKE 通常只与字符串一起使用,并且等于(我相信)更快。等于运算符将通配符视为文字字符。返回结果的差异如下:

SELECT * FROM Employees WHERE Name = 'Chris';

And

SELECT * FROM Employees WHERE Name LIKE 'Chris';

Would return the same result, though using LIKE would generally take longer as its a pattern match. However,

将返回相同的结果,但使用 LIKE 通常需要更长的时间作为模式匹配。然而,

SELECT * FROM Employees WHERE Name = 'Chris%';

And

SELECT * FROM Employees WHERE Name LIKE 'Chris%';

Would return different results, where using "=" results in only results with "Chris%" being returned and the LIKE operator will return anything starting with "Chris".

将返回不同的结果,其中使用“=”只会返回带有“Chris%”的结果,而 LIKE 运算符将返回以“Chris”开头的任何内容。

Hope that helps. Some good info can be found here.

希望有帮助。一些好的信息可以在这里找到。

回答by Aris

This is a copy/paste of another answer of mine for question SQL 'like' vs '=' performance:

这是我对SQL 'like' vs '=' performance问题的另一个答案的复制/粘贴:

A personal example using mysql 5.5: I had an inner join between 2 tables, one of 3 million rows and one of 10 thousand rows.

使用 mysql 5.5 的个人示例:我在 2 个表之间进行了内部联接,其中一个是 300 万行,另一个是 1 万行。

When using a like on an index as below(no wildcards), it took about 30 seconds:

在索引上使用类似如下(无通配符)时,大约需要 30 秒:

where login like '12345678'

using 'explain' I get:

使用“解释”我得到:

enter image description here

在此处输入图片说明

When using an '=' on the same query, it took about 0.1 seconds:

在同一个查询上使用 '=' 时,大约需要 0.1 秒:

where login ='12345678'

Using 'explain' I get:

使用“解释”我得到:

enter image description here

在此处输入图片说明

As you can see, the likecompletely cancelled the index seek, so query took 300 times more time.

如您所见,like完全取消了索引查找,因此查询花费了 300 倍以上的时间。

回答by WalterJ89

LIKEand =are different. LIKEis what you would use in a search query. It also allows wildcards like _(simple character wildcard) and %(multi-character wildcard).

LIKE并且=是不同的。LIKE是您将在搜索查询中使用的内容。它还允许使用通配符,如_(简单字符通配符)和%(多字符通配符)。

=should be used if you want exact matches and it will be faster.

=如果您想要精确匹配,应该使用它,它会更快。

This site explains LIKE

这个网站解释 LIKE

回答by ISW

One difference - apart from the possibility to use wildcards with LIKE - is in trailing spaces: The = operator ignores trailing space, but LIKE does not.

一个区别 - 除了可以在 LIKE 中使用通配符之外 - 是在尾随空格:= 运算符忽略尾随空格,但 LIKE 不会。

回答by ???u

Depends on the database system.

取决于数据库系统。

Generally with no special characters, yes, = and LIKE are the same.

一般没有特殊字符,是的,= 和 LIKE 是一样的。

Some database systems, however, may treat collation settings differently with the different operators.

但是,某些数据库系统可能会根据不同的运算符对整理设置进行不同的处理。

For instance, in MySQL comparisons with = on strings is always case-insensitive by default, so LIKE without special characters is the same. On some other RDBMS's LIKE is case-insensitive while = is not.

例如,在 MySQL 中,在字符串上与 = 的比较在默认情况下总是不区分大小写的,因此没有特殊字符的 LIKE 是相同的。在其他一些 RDBMS 上,LIKE 不区分大小写,而 = 不区分大小写。

回答by Arnab

For this example we take it for granted that varcharcol doesn't contain ''and have no empty cell against this column

对于这个例子,我们理所当然地认为 varcharcol 不包含''并且没有针对此列的空单元格

select * from some_table where varcharCol = ''
select * from some_table where varcharCol like ''

The first one results in 0 row output while the second one shows the whole list. = is strictly-match case while like acts like a filter. if filter has no criteria, every data is valid.

第一个导致 0 行输出,而第二个显示整个列表。= 是严格匹配大小写,而 like 就像过滤器一样。如果过滤器没有条件,则每个数据都是有效的。

like - by the virtue of its purpose works a little slower and is intended for use with varchar and similar data.

like - 由于其目的,工作速度稍慢,旨在用于 varchar 和类似数据。

回答by Laramie

To address the original question regarding performance, it comes down to index utilization. When a simple table scan occurs, "LIKE" and "=" are identical. When indexes are involved, it dependson how the LIKE clause is formed. More specifically, what is the location of the wildcard(s)?

为了解决有关性能的原始问题,归结为索引利用率。当发生简单的表扫描时,“LIKE”和“=”是相同的。当涉及索引时,这取决于LIKE 子句的形成方式。更具体地说,通配符的位置是什么?



Consider the following:

考虑以下:

CREATE TABLE test(
    txt_col  varchar(10) NOT NULL
)
go

insert test (txt_col)
select CONVERT(varchar(10), row_number() over (order by (select 1))) r
  from master..spt_values a, master..spt_values b
go

CREATE INDEX IX_test_data 
    ON test (txt_col);
go 

--Turn on Show Execution Plan
set statistics io on

--A LIKE Clause with a wildcard at the beginning
DBCC DROPCLEANBUFFERS
SELECT txt_Col from test where txt_col like '%10000'
--Results in
--Table 'test'. Scan count 3, logical reads 15404, physical reads 2, read-ahead reads 15416, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Index SCAN is 85% of Query Cost

--A LIKE Clause with a wildcard in the middle
DBCC DROPCLEANBUFFERS
SELECT txt_Col from test where txt_col like '1%99'
--Results in
--Table 'test'. Scan count 1, logical reads 3023, physical reads 3, read-ahead reads 3018, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Index Seek is 100% of Query Cost for test data, but it may result in a Table Scan depending on table size/structure

--A LIKE Clause with no wildcards
DBCC DROPCLEANBUFFERS
SELECT txt_Col from test where txt_col like '10000'
--Results in
--Table 'test'. Scan count 1, logical reads 3, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Index Seek is 100% of Query Cost
GO

--an "=" clause = does Index Seek same as above
DBCC DROPCLEANBUFFERS
SELECT txt_Col from test where txt_col = '10000'
--Results in
--Table 'test'. Scan count 1, logical reads 3, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
--Index Seek is 100% of Query Cost
GO


DROP TABLE test

There may be also negligible difference in the creation of the query plan when using "=" vs "LIKE".

使用“=”与“LIKE”时,查询计划的创建也可能存在微不足道的差异。

回答by marc_s

If you search for an exact match, you can use both, = and LIKE.

如果搜索完全匹配,则可以同时使用 = 和 LIKE。

Using "=" is a tiny bit faster in this case (searching for an exact match) - you can check this yourself by having the same query twice in SQL Server Management Studio, once using "=", once using "LIKE", and then using the "Query" / "Include actual execution plan".

在这种情况下,使用“=”会稍微快一点(搜索完全匹配) - 您可以通过在 SQL Server Management Studio 中两次使用相同的查询来自己检查这一点,一次使用“=”,一次使用“LIKE”,然后然后使用“查询”/“包括实际执行计划”。

Execute the two queries and you should see your results twice, plus the two actual execution plans. In my case, they were split 50% vs. 50%, but the "=" execution plan has a smaller "estimated subtree cost" (displayed when you hover over the left-most "SELECT" box) - but again, it's really not a huge difference.

执行这两个查询,您应该会看到两次结果,以及两个实际的执行计划。在我的例子中,它们被分割为 50% 和 50%,但是“=”执行计划的“估计子树成本”较小(当您将鼠标悬停在最左侧的“选择”框上时显示) - 但同样,它确实是差别不大。

But when you start searching with wildcards in your LIKE expression, search performance will dimish. Search "LIKE Mill%" can still be quite fast - SQL Server can use an index on that column, if there is one. Searching "LIKE %expression%" is horribly slow, since the only way SQL Server can satisfy this search is by doing a full table scan. So be careful with your LIKE's !

但是,当您开始在 LIKE 表达式中使用通配符进行搜索时,搜索性能会下降。搜索“LIKE Mill%”仍然可以很快——SQL Server 可以在该列上使用索引(如果有)。搜索“LIKE %expression%”非常慢,因为 SQL Server 可以满足此搜索的唯一方法是进行全表扫描。所以要小心你的 LIKE !

Marc

马克

回答by Coincoin

Using = avoids wildcards and special characters conflicts in the string when you build the query at run time.

在运行时构建查询时,使用 = 可避免字符串中的通配符和特殊字符冲突。

This makes the programmer's life easier by not having to escape all special wildcard characters that might slip in the LIKE clause and not producing the intended result. After all, = is the 99% use case scenario, it would be a pain to have to escape them every time.

这使程序员的生活变得更轻松,因为不必转义所有可能滑入 LIKE 子句的特殊通配符,也不会产生预期的结果。毕竟,= 是 99% 的用例场景,每次都必须逃避它们会很痛苦。

rolls eyes at '90s

90年代翻白眼

I also suspect it's a little bit slower, but I doubt it's significant if there are no wildcards in the pattern.

我也怀疑它有点慢,但我怀疑如果模式中没有通配符,它​​是否重要。