oracle 选择不同的 ... 内连接 vs. 选择 ... where id in (...)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2638989/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Select distinct ... inner join vs. select ... where id in (...)
提问by Tonio
I'm trying to create a subset of a table (as a materialized view), defined as those records which have a matching record in another materialized view.
我正在尝试创建表的子集(作为物化视图),定义为在另一个物化视图中具有匹配记录的那些记录。
For example, let's say I have a Users table with user_id and name columns, and a Log table, with entry_id, user_id, activity, and timestamp columns.
例如,假设我有一个包含 user_id 和 name 列的 Users 表,以及一个包含 entry_id、user_id、activity 和 timestamp 列的 Log 表。
First I create a materialized view of the Log table, selecting only those rows with timestamp > some_date. Now I want a materliazed view of the Users referenced in my snapshot of the Log table. I can either create it as
首先,我创建日志表的物化视图,仅选择时间戳 > some_date 的行。现在我想要一个在我的日志表快照中引用的用户的实物视图。我可以将其创建为
select * from Users where user_id in (select user_id from Log_mview)
or I can do
或者我可以
select distinct u.* from Users u inner join Log_mview l on u.user_id = l.user_id
(need the distinct to avoid multiple hits from users with multiple log entries).
(需要不同的以避免来自具有多个日志条目的用户的多次点击)。
The former seems cleaner and more elegant, but takes much longer. Am I missing something? Is there a better way to do this?
前者看起来更干净、更优雅,但需要更长的时间。我错过了什么吗?有一个更好的方法吗?
Edit: The where exists
clause helped a lot, except in the case where the condition uses an OR
. For example, let's say the Log table above also had a user_name column, and the correct way to match a Log entry to a Users record is when either of the columns (user id or user name) match. I'm finding that
编辑:该where exists
条款有很大帮助,除非条件使用OR
. 例如,假设上面的日志表也有一个 user_name 列,将日志条目与用户记录匹配的正确方法是当任一列(用户 ID 或用户名)匹配时。我发现
select distinct u.* from Users u
inner join Log_mview l
on u.user_id = l.user_id or u.name = l.user_name
is much faster than
比
select * from Users u where exists
(select id from Log_mview l
where l.user_id = u.user_id or l.user_name = u.name)
Any help?
有什么帮助吗?
(Regarding the explain plan... Lemme work on sanitizing it, or them, rather... I'll post them in a while.)
(关于解释计划......让我努力消毒它,或者他们,而不是......我会在一段时间后发布它们。)
Edit: explain plans: For the query with inner join:
编辑:解释计划:对于具有内连接的查询:
Plan hash value: 436698422 --------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4539K| 606M| | 637K (3)| 02:07:25 | | 1 | HASH UNIQUE | | 4539K| 606M| 3201M| 637K (3)| 02:07:25 | | 2 | CONCATENATION | | | | | | | |* 3 | HASH JOIN | | 4206K| 561M| 33M| 181K (4)| 00:36:14 | | 4 | BITMAP CONVERSION TO ROWIDS | | 926K| 22M| | 2279 (1)| 00:00:28 | | 5 | BITMAP INDEX FAST FULL SCAN| I_M_LOG_MVIEW_4 | | | | | | |* 6 | TABLE ACCESS FULL | USERS | 15M| 1630M| | 86638 (6)| 00:17:20 | |* 7 | HASH JOIN | | 7646K| 1020M| 33M| 231K (4)| 00:46:13 | | 8 | BITMAP CONVERSION TO ROWIDS | | 926K| 22M| | 2279 (1)| 00:00:28 | | 9 | BITMAP INDEX FAST FULL SCAN| I_M_LOG_MVIEW_4 | | | | | | | 10 | TABLE ACCESS FULL | USERS | 23M| 2515M| | 87546 (7)| 00:17:31 | --------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access("U"."NAME"="L"."USER_NAME") 6 - filter("U"."NAME" IS NOT NULL) 7 - access("U"."USER_ID"=TO_NUMBER("L"."USER_ID")) filter(LNNVL("U"."NAME"="L"."USER_NAME") OR LNNVL("U"."NAME" IS NOT NULL)) Note ----- - dynamic sampling used for this statement
For the one using where exists
:
对于使用where exists
:
Plan hash value: 2786958565 ----------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 114 | 21M (1)| 70:12:13 | |* 1 | FILTER | | | | | | | 2 | TABLE ACCESS FULL | USERS | 23M| 2515M| 87681 (7)| 00:17:33 | | 3 | BITMAP CONVERSION TO ROWIDS | | 7062 | 179K| 1 (0)| 00:00:01 | |* 4 | BITMAP INDEX FAST FULL SCAN| I_M_LOG_MVIEW_4 | | | | | ----------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter( EXISTS (SELECT /*+ */ 0 FROM "MYSCHEMA"."LOG_MVIEW" "LOG_MVIEW" WHERE ("USER_NAME"=:B1 OR TO_NUMBER("USER_ID")=:B2) AND ("USER_NAME"=:B3 OR TO_NUMBER("USER_ID")=:B4) AND ("USER_NAME"=:B5 OR TO_NUMBER("USER_ID")=:B6))) 4 - filter("USER_NAME"=:B1 OR TO_NUMBER("USER_ID")=:B2) Note ----- - dynamic sampling used for this statement
DB object names changed to protect the innocent. :p
DB 对象名称已更改以保护无辜者。:p
采纳答案by APC
Try this
尝试这个
select * from Users u
where exists
( select user_id
from Log_mview l
where l.user_id = u.user_id )
/
If the sub-query returns a large number of rows WHERE EXISTS
can be substantially faster than WHERE ... IN
.
如果子查询返回大量行,WHERE EXISTS
可以大大快于WHERE ... IN
.
回答by Peter Lang
This will depend on the data you have, but using Distinct
within the join could improve your performance:
这将取决于您拥有的数据,但Distinct
在连接中使用可以提高您的性能:
Select u.*
From Users u
Join ( Select Distinct user_id
From log_mview ) l On u.user_id = l.user_id
回答by Klinger
The second query is probably working more the harddrive than the first query (join+distinc).
第二个查询可能比第一个查询(join+distinc)在硬盘上工作得更多。
The first query will probably translates to something like:
第一个查询可能会转换为以下内容:
对于表 Log 中的每一行,在表 User(在内存中)中找到对应的行。 数据库可能足够聪明,可以在内存中为表 User 创建可能比 Log 表小得多的结构。我相信查询一(join+distinct)只需要对表日志进行一次传递。 distinct 可能在内存中执行。The second query probably forces the database to do multiples fulls reads on table Log.
第二个查询可能会强制数据库对表 Log 执行多次完整读取。
So in the second query you probably get:
所以在第二个查询中你可能会得到:
For each row in table user read all the rows in table Log (from disk) in order to match the condition.
对于表中的每一行,用户读取表 Log(从磁盘)中的所有行以匹配条件。
You have also to consider that some query may experience a dramatic diference in speed due to changes in memory availability, load and table increase.
您还必须考虑到,由于内存可用性、负载和表增加的变化,某些查询的速度可能会出现巨大差异。