SQL 带有 WHERE 子句的 UNION

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5437507/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 09:52:29  来源:igfitidea点击:

UNION with WHERE clause

sqloracleunion

提问by MNIK

I'm doing a UNIONof two queries on an Oracle database. Both of them have a WHEREclause. Is there a difference in the performance if I do the WHEREafter UNIONing the queries compared to performing the UNIONafter WHEREclause?

我正在UNION对 Oracle 数据库执行两个查询中的一个。两者都有一个WHERE条款。如果我执行WHEREafterUNION查询与执行UNIONafterWHERE子句相比,性能是否有所不同?

For example:

例如:

SELECT colA, colB FROM tableA WHERE colA > 1
UNION
SELECT colA, colB FROM tableB WHERE colA > 1

compared to:

相比:

SELECT * 
  FROM (SELECT colA, colB FROM tableA
        UNION
        SELECT colA, colB FROM tableB) 
 WHERE colA > 1

I believe in the second case, it performs a full table scan on both the tables affecting the performance. Is that correct?

我相信在第二种情况下,它对影响性能的两个表执行全表扫描。那是对的吗?

回答by Ronnis

In my experience, Oracle is very good at pushing simplepredicates around. The following test was made on Oracle 11.2. I'm fairly certain it produces the same execution plan on all releases of 10g as well.

根据我的经验,Oracle 非常擅长推动简单的谓词。以下测试是在 Oracle 11.2 上进行的。我相当肯定它也会在 10g 的所有版本上生成相同的执行计划。

(Please people, feel free to leave a comment if you run an earlier version and tried the following)

(请人们,如果您运行早期版本并尝试以下操作,请随时发表评论)

create table table1(a number, b number);
create table table2(a number, b number);

explain plan for
select *
  from (select a,b from table1
        union 
        select a,b from table2
       )
 where a > 1;

select * 
  from table(dbms_xplan.display(format=>'basic +predicate'));

PLAN_TABLE_OUTPUT
---------------------------------------
| Id  | Operation            | Name   |
---------------------------------------
|   0 | SELECT STATEMENT     |        |
|   1 |  VIEW                |        |
|   2 |   SORT UNIQUE        |        |
|   3 |    UNION-ALL         |        |
|*  4 |     TABLE ACCESS FULL| TABLE1 |
|*  5 |     TABLE ACCESS FULL| TABLE2 |
---------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------    
   4 - filter("A">1)
   5 - filter("A">1)

As you can see at steps (4,5), the predicate is pushed down and applied before the sort (union).

正如您在步骤 (4,5) 中看到的那样,谓词被下推并在排序(联合)之前应用。

I couldn't get the optimizer to push down an entire sub query such as

我无法让优化器下推整个子查询,例如

 where a = (select max(a) from empty_table)

or a join. With proper PK/FK constraints in place it might be possible, but clearly there are limitations :)

或加入。有了适当的 PK/FK 约束,这可能是可能的,但显然存在限制:)

回答by Gary Myers

Just a caution

只是一个警告

If you tried

如果你试过

SELECT colA, colB FROM tableA WHERE colA > 1
UNION
SELECT colX, colA FROM tableB WHERE colA > 1

compared to:

相比:

SELECT * 
  FROM (SELECT colA, colB FROM tableA
        UNION
        SELECT colX, colA FROM tableB) 
 WHERE colA > 1

Then in the second query, the colA in the where clause will actually have the colX from tableB, making it a very different query. If columns are being aliased in this way, it can get confusing.

然后在第二个查询中,where 子句中的 colA 实际上将包含 tableB 中的 colX,使其成为一个非常不同的查询。如果以这种方式为列添加别名,则可能会造成混淆。

回答by btilly

NOTE: While my advice was true many years ago, Oracle's optimizer has improved so that the location of the where definitely no longer matters here. However preferring UNION ALLvs UNIONwill always be true, and portable SQL should avoid depending on optimizations that may not be in all databases.

注意:虽然我的建议在多年前是正确的,但 Oracle 的优化器已经改进,因此 where 的位置在这里绝对不再重要。然而,首选UNION ALLvsUNION总是正确的,并且可移植 SQL 应该避免依赖于并非在所有数据库中都存在的优化。

Short answer, you want the WHEREbefore the UNIONand you want to use UNION ALLif at all possible. If you are using UNION ALLthen check the EXPLAIN output, Oracle might be smart enough to optimize the WHEREcondition if it is left after.

简短的回答,你想要WHERE之前的,UNION并且UNION ALL如果可能的话,你想使用。如果您正在使用UNION ALL然后检查 EXPLAIN 输出,Oracle 可能足够聪明来优化WHERE条件,如果它被留下。

The reason is the following. The definition of a UNIONsays that if there are duplicates in the two data sets, they have to be removed. Therefore there is an implicit GROUP BYin that operation, which tends to be slow. Worse yet, Oracle's optimizer (at least as of 3 years ago, and I don't think it has changed) doesn't try to push conditions through a GROUP BY(implicit or explicit). Therefore Oracle has to construct larger data sets than necessary, group them, and only then gets to filter. Thus prefiltering wherever possible is officially a Good Idea. (This is, incidentally, why it is important to put conditions in the WHEREwhenever possible instead of leaving them in a HAVINGclause.)

原因如下。a 的定义UNION表示如果两个数据集中存在重复项,则必须将其删除。因此GROUP BY,该操作中有一个隐含的内容,它往往很慢。更糟糕的是,Oracle 的优化器(至少在 3 年前,我认为它没有改变)并没有尝试通过GROUP BY(隐式或显式)推送条件。因此,Oracle 必须构建比必要更大的数据集,将它们分组,然后才能进行过滤。因此,在可能的情况下进行预过滤正式是一个好主意。(顺便说一句,这就是为什么WHERE尽可能将条件放在 the 中而不是将它们留在HAVING子句中是很重要的。)

Furthermore if you happen to know that there won't be duplicates between the two data sets, then use UNION ALL. That is like UNIONin that it concatenates datasets, but it doesn't try to deduplicate data. This saves an expensive grouping operation. In my experience it is quite common to be able to take advantage of this operation.

此外,如果您碰巧知道两个数据集之间不会重复,则使用UNION ALL. 这就像UNION它连接数据集,但它不会尝试去重复数据。这节省了昂贵的分组操作。根据我的经验,能够利用此操作是很常见的。

Since UNION ALLdoes not have an implicit GROUP BYin it, it is possible that Oracle's optimizer knows how to push conditions through it. I don't have Oracle sitting around to test, so you will need to test that yourself.

由于其中UNION ALL没有隐式GROUP BY,Oracle 的优化器可能知道如何通过它推送条件。我没有让 Oracle 坐下来进行测试,因此您需要自己进行测试。

回答by EvilTeach

You need to look at the explain plans, but unless there is an INDEX or PARTITION on COL_A, you are looking at a FULL TABLE SCAN on both tables.

您需要查看解释计划,但除非在 COL_A 上有 INDEX 或 PARTITION,否则您正在查看两个表上的 FULL TABLE SCAN。

With that in mind, your first example is throwing out some of the data as it does the FULL TABLE SCAN. That result is being sorted by the UNION, then duplicate data is dropped. This gives you your result set.

考虑到这一点,您的第一个示例在执行全表扫描时会丢弃一些数据。该结果由 UNION 排序,然后删除重复数据。这为您提供了结果集。

In the second example, you are pulling the full contents of both tables. That result is likely to be larger. So the UNION is sorting more data, then dropping the duplicate stuff. Then the filter is being applied to give you the result set you are after.

在第二个示例中,您将拉取两个表的全部内容。这个结果可能更大。因此 UNION 正在对更多数据进行排序,然后删除重复的内容。然后应用过滤器为您提供您所追求的结果集。

As a general rule, the earlier you filter away data, the smaller the data set, and the faster you will get your results. As always, your milage may vary.

一般来说,越早过滤掉数据,数据集就越小,得到结果的速度就越快。与往常一样,您的里程可能会有所不同。

回答by rayman86

I would make sure you have an index on ColA, and then run both of them and time them. That would give you the best answer.

我会确保你在 ColA 上有一个索引,然后运行它们并计时。那会给你最好的答案。

回答by Randy

i think it will depend on many things - run EXPLAIN PLANon each one to see what your optimizer selects. Otherwise - as @rayman suggests - run them both and time them.

我认为这将取决于很多事情 - 运行EXPLAIN PLAN每一项以查看您的优化器选择什么。否则 - 正如@rayman 建议的那样 - 运行它们并计时。

回答by Anbarasi Selvaraj

SELECT * FROM (SELECT colA, colB FROM tableA UNION SELECT colA, colB FROM tableB) as tableC WHERE tableC.colA > 1

If we're using a union that contains the same field name in 2 tables, then we need to give a name to the sub query as tableC(in above query). Finally, the WHEREcondition should be WHERE tableC.colA > 1

如果我们使用的联合在 2 个表中包含相同的字段名称,那么我们需要为子查询命名为 tableC(在上面的查询中)。最后,WHERE条件应该是WHERE tableC.colA > 1

回答by Michael Andrews

SELECT colA, colB FROM tableA  WHERE colA > 1
UNION
SELECT colX, colA FROM tableB

回答by nandhini

SELECT * 
FROM (SELECT * FROM can
    UNION
    SELECT * FROM employee) as e
WHERE e.id = 1;