MySQL SQL 性能 UNION 与 OR
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13750475/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL Performance UNION vs OR
提问by Jason McCreary
I just read part of an optimization article and segfaultedon the following statement:
我刚才读的优化文章的一部分,segfaulted以下声明:
When using SQL replace statements using
OR
with aUNION
:select username from users where company = ‘bbc' or company = ‘itv';
to:
select username from users where company = ‘bbc' union select username from users where company = ‘itv';
使用 SQL 替换语句时,使用
OR
withUNION
:select username from users where company = ‘bbc' or company = ‘itv';
到:
select username from users where company = ‘bbc' union select username from users where company = ‘itv';
From a quick EXPLAIN
:
从快速EXPLAIN
:
Using OR
:
使用OR
:
Using UNION
:
使用UNION
:
Doesn't this mean UNION
does in double the work?
没有这意味着UNION
确实在双工作?
While I appreciate UNION
may be more performant for certain RDBMSes and certain table schemas, this is not categorically trueas the author suggestions.
虽然我很欣赏UNION
某些 RDBMS 和某些表模式的性能可能更高,但这并不像作者建议的那样绝对正确。
Question
题
Am I wrong?
我错了吗?
回答by Bill Karwin
Either the article you read used a bad example, or you misinterpreted their point.
您阅读的文章使用了一个不好的例子,或者您误解了他们的观点。
select username from users where company = 'bbc' or company = 'itv';
This is equivalent to:
这相当于:
select username from users where company IN ('bbc', 'itv');
MySQL can use an index on company
for this query just fine. There's no need to do any UNION.
MySQL 可以company
对这个查询使用索引就好了。没有必要做任何联合。
The more tricky case is where you have an OR
condition that involves two differentcolumns.
更棘手的情况是您的OR
条件涉及两个不同的列。
select username from users where company = 'bbc' or city = 'London';
Suppose there's an index on company
and a separate index on city
. Given that MySQL usually uses only one index per table in a given query, which index should it use? If it uses the index on company
, it would still have to do a table-scan to find rows where city
is London. If it uses the index on city
, it would have to do a table-scan for rows where company
is bbc.
假设在 上有一个索引,在 上company
有一个单独的索引city
。鉴于 MySQL 在给定查询中通常每个表只使用一个索引,它应该使用哪个索引?如果它使用 on 的索引company
,它仍然需要进行表扫描才能找到city
伦敦所在的行。如果它使用 on 索引city
,则必须对company
bbc所在的行进行表扫描。
The UNION
solution is for this type of case.
该UNION
解决方案是对于这种类型的箱子。
select username from users where company = 'bbc'
union
select username from users where city = 'London';
Now each sub-query can use the index for its search, and the results of the subquery are combined by the UNION
.
现在每个子查询都可以使用索引进行搜索,子查询的结果由UNION
.
An anonymous user proposed an edit to my answer above, but a moderator rejected the edit. It should have been a comment, not an edit. The claim of the proposed edit was that UNION has to sort the result set to eliminate duplicate rows. This makes the query run slower, and the index optimization is therefore a wash.
一位匿名用户提议对我上面的回答进行编辑,但版主拒绝了该编辑。它应该是评论,而不是编辑。提议的编辑声称 UNION 必须对结果集进行排序以消除重复的行。这使得查询运行速度变慢,因此索引优化是一种清洗。
My response is that that the indexes help to reduce the result set to a small number of rows before the UNION happens. UNION does in fact eliminate duplicates, but to do that it only has to sort the small result set. There might be cases where the WHERE clauses match a significant portion of the table, and sorting during UNION is as expensive as simply doing the table-scan. But it's more common for the result set to be reduced by the indexed searches, so the sorting is much less costly than the table-scan.
我的回答是索引有助于在 UNION 发生之前将结果集减少到少量行。UNION 确实消除了重复,但要做到这一点,它只需要对小结果集进行排序。可能存在 WHERE 子句匹配表的重要部分的情况,并且 UNION 期间的排序与简单地执行表扫描一样昂贵。但更常见的是通过索引搜索减少结果集,因此排序比表扫描成本低得多。
The difference depends on the data in the table, and the terms being searched. The only way to determine the best solution for a given query is to try both methods in the MySQL query profilerand compare their performance.
差异取决于表中的数据和正在搜索的术语。确定给定查询的最佳解决方案的唯一方法是在MySQL 查询分析器中尝试这两种方法并比较它们的性能。
回答by Darren Kopp
Those are not the same query.
这些不是同一个查询。
I don't have much experience with MySQL, so I am not sure what the query optimizer does or does not do, but here are my thoughts from my general background (primarily ms sql server).
我对 MySQL 没有太多经验,所以我不确定查询优化器做什么或不做什么,但这里是我的一般背景(主要是 ms sql server)的想法。
Typically, the query analyzer can take the above two queries and make the exact same plan out of them (if they were the same), so it wouldn't matter. I would suspect that there is no performance difference between these queries (which are equivalent)
通常,查询分析器可以采用上述两个查询并从中制定完全相同的计划(如果它们相同),所以这无关紧要。我怀疑这些查询之间没有性能差异(它们是等效的)
select distinct username from users where company = ‘bbc' or company = ‘itv';
and
和
select username from users where company = ‘bbc'
union
select username from users where company = ‘itv';
Now, the question is, would there be a difference between the following queries, of which I actually don't know, but I would suspect that the optimizer would make it more like the first query
现在,问题是,以下查询之间是否有区别,我实际上不知道,但我怀疑优化器会使它更像第一个查询
select username from users where company = ‘bbc' or company = ‘itv';
and
和
select username from users where company = ‘bbc'
union all
select username from users where company = ‘itv';
回答by David J
It depends on what the optimizer ends up doing based on the size of the data, indexes, software version, etc.
这取决于优化器最终根据数据大小、索引、软件版本等执行的操作。
I would guess that using OR would give the optimizer a better chance at finding some efficiencies, since everything is in a single logical statement.
我猜想使用 OR 会给优化器一个更好的机会来找到一些效率,因为一切都在一个单一的逻辑语句中。
Also, UNION has some overhead, since it creates a reset set(no duplicates). Each statement in the UNION should execute pretty quickly if companyis indexed... not sure it'd really be doing doublethe work.
此外,UNION 有一些开销,因为它创建了一个重置集(没有重复)。如果公司被编入索引,UNION 中的每个语句都应该很快执行……不确定它真的会做两倍的工作。
Bottom line
底线
Unless you really have a burning need to squeeze every bit of speed out of your query, it's probably better to just go with the form that best communicates your intention... the OR
除非您真的迫切需要从查询中挤出每一点速度,否则最好使用最能传达您的意图的形式...... OR
Update
更新
I also meant to mention IN. I believe the following query will give better performance than the OR (it's also the form I prefer):
我也想提到IN。我相信以下查询将提供比 OR 更好的性能(这也是我更喜欢的形式):
select username from users where company in ('bbc', 'itv');
select username from users where company in ('bbc', 'itv');
回答by Gordon Linoff
In almost all cases, the union
or union all
version is going to do two full table scans of the users table.
在几乎所有情况下,union
orunion all
版本都会对用户表进行两次全表扫描。
The or
version is much better in practice, since it will only scan the table once. It will also use an index only once, if available.
该or
版本在实践中要好得多,因为它只会扫描表一次。如果可用,它也将只使用一次索引。
The original statement just seems wrong, for just about any database and any situation.
最初的声明似乎是错误的,几乎适用于任何数据库和任何情况。
回答by ?a?atay Gürtürk
Bill Karwin's answer is pretty right. When the both part of the OR statement has its own index, it's better doing union because once you have a small subset of results, it's easier to sort them and eliminate duplicates. Total cost is almost less than using only one index (for one of the column) and table scan for the other column (because mysql only uses one index for one column).
Bill Karwin 的回答非常正确。当 OR 语句的两个部分都有自己的索引时,最好进行联合,因为一旦您拥有一小部分结果,就可以更轻松地对它们进行排序并消除重复项。总成本几乎低于仅使用一个索引(针对其中一列)和针对另一列进行表扫描(因为 mysql 仅对一列使用一个索引)。
It depends of the table's structure and needs generally but in large tables union gave to me better results.
这取决于表的结构和一般需求,但在大表中联合给了我更好的结果。