oracle 选择不同的 ... 内连接 vs. 选择 ... where id in (...)

Question

提问by Tonio

I'm trying to create a subset of a table (as a materialized view), defined as those records which have a matching record in another materialized view.

我正在尝试创建表的子集（作为物化视图），定义为在另一个物化视图中具有匹配记录的那些记录。

For example, let's say I have a Users table with user_id and name columns, and a Log table, with entry_id, user_id, activity, and timestamp columns.

例如，假设我有一个包含 user_id 和 name 列的 Users 表，以及一个包含 entry_id、user_id、activity 和 timestamp 列的 Log 表。

First I create a materialized view of the Log table, selecting only those rows with timestamp > some_date. Now I want a materliazed view of the Users referenced in my snapshot of the Log table. I can either create it as

首先，我创建日志表的物化视图，仅选择时间戳 > some_date 的行。现在我想要一个在我的日志表快照中引用的用户的实物视图。我可以将其创建为

select * from Users where user_id in (select user_id from Log_mview)

or I can do

或者我可以

select distinct u.* from Users u inner join Log_mview l on u.user_id = l.user_id

(need the distinct to avoid multiple hits from users with multiple log entries).

（需要不同的以避免来自具有多个日志条目的用户的多次点击）。

The former seems cleaner and more elegant, but takes much longer. Am I missing something? Is there a better way to do this?

前者看起来更干净、更优雅，但需要更长的时间。我错过了什么吗？有一个更好的方法吗？

Edit: The where existsclause helped a lot, except in the case where the condition uses an OR. For example, let's say the Log table above also had a user_name column, and the correct way to match a Log entry to a Users record is when either of the columns (user id or user name) match. I'm finding that

编辑：该where exists条款有很大帮助，除非条件使用OR. 例如，假设上面的日志表也有一个 user_name 列，将日志条目与用户记录匹配的正确方法是当任一列（用户 ID 或用户名）匹配时。我发现

select distinct u.* from Users u
    inner join Log_mview l
        on u.user_id = l.user_id or u.name = l.user_name

is much faster than

比

select * from Users u where exists
    (select id from Log_mview l 
        where l.user_id = u.user_id or l.user_name = u.name)

Any help?

有什么帮助吗？

(Regarding the explain plan... Lemme work on sanitizing it, or them, rather... I'll post them in a while.)

（关于解释计划......让我努力消毒它，或者他们，而不是......我会在一段时间后发布它们。）

Edit: explain plans: For the query with inner join:

编辑：解释计划：对于具有内连接的查询：

Plan hash value: 436698422

---------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                     |  4539K|   606M|       |   637K  (3)| 02:07:25 |
|   1 |  HASH UNIQUE                    |                     |  4539K|   606M|  3201M|   637K  (3)| 02:07:25 |
|   2 |   CONCATENATION                 |                     |       |       |       |            |          |
|*  3 |    HASH JOIN                    |                     |  4206K|   561M|    33M|   181K  (4)| 00:36:14 |
|   4 |     BITMAP CONVERSION TO ROWIDS |                     |   926K|    22M|       |  2279   (1)| 00:00:28 |
|   5 |      BITMAP INDEX FAST FULL SCAN| I_M_LOG_MVIEW_4     |       |       |       |            |          |
|*  6 |     TABLE ACCESS FULL           | USERS               |    15M|  1630M|       | 86638   (6)| 00:17:20 |
|*  7 |    HASH JOIN                    |                     |  7646K|  1020M|    33M|   231K  (4)| 00:46:13 |
|   8 |     BITMAP CONVERSION TO ROWIDS |                     |   926K|    22M|       |  2279   (1)| 00:00:28 |
|   9 |      BITMAP INDEX FAST FULL SCAN| I_M_LOG_MVIEW_4     |       |       |       |            |          |
|  10 |     TABLE ACCESS FULL           | USERS               |    23M|  2515M|       | 87546   (7)| 00:17:31 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("U"."NAME"="L"."USER_NAME")
   6 - filter("U"."NAME" IS NOT NULL)
   7 - access("U"."USER_ID"=TO_NUMBER("L"."USER_ID"))
       filter(LNNVL("U"."NAME"="L"."USER_NAME") OR LNNVL("U"."NAME" IS NOT NULL))

Note
-----
   - dynamic sampling used for this statement

For the one using where exists:

对于使用where exists：

Plan hash value: 2786958565

-----------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                     |     1 |   114 |    21M  (1)| 70:12:13 |
|*  1 |  FILTER                       |                     |       |       |            |          |
|   2 |   TABLE ACCESS FULL           | USERS               |    23M|  2515M| 87681   (7)| 00:17:33 |
|   3 |   BITMAP CONVERSION TO ROWIDS |                     |  7062 |   179K|     1   (0)| 00:00:01 |
|*  4 |    BITMAP INDEX FAST FULL SCAN| I_M_LOG_MVIEW_4     |       |       |            |          |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT /*+ */ 0 FROM "MYSCHEMA"."LOG_MVIEW" 
              "LOG_MVIEW" WHERE ("USER_NAME"=:B1 OR TO_NUMBER("USER_ID")=:B2) AND 
              ("USER_NAME"=:B3 OR TO_NUMBER("USER_ID")=:B4) AND ("USER_NAME"=:B5 OR 
              TO_NUMBER("USER_ID")=:B6)))
   4 - filter("USER_NAME"=:B1 OR TO_NUMBER("USER_ID")=:B2)

Note
-----
   - dynamic sampling used for this statement

DB object names changed to protect the innocent. :p

DB 对象名称已更改以保护无辜者。:p

Answer 1

采纳答案by APC

Try this

尝试这个

select * from Users u
where exists 
   ( select user_id 
     from Log_mview l
     where l.user_id = u.user_id )
/

If the sub-query returns a large number of rows WHERE EXISTScan be substantially faster than WHERE ... IN.

如果子查询返回大量行，WHERE EXISTS可以大大快于WHERE ... IN.

Answer 2

回答by Peter Lang

This will depend on the data you have, but using Distinctwithin the join could improve your performance:

这将取决于您拥有的数据，但Distinct在连接中使用可以提高您的性能：

Select u.*
From Users u
Join ( Select Distinct user_id
       From log_mview ) l On u.user_id = l.user_id

Answer 3

回答by Klinger

The second query is probably working more the harddrive than the first query (join+distinc).

第二个查询可能比第一个查询（join+distinc）在硬盘上工作得更多。

The first query will probably translates to something like:

第一个查询可能会转换为以下内容：

对于表 Log 中的每一行，在表 User（在内存中）中找到对应的行。

数据库可能足够聪明，可以在内存中为表 User 创建可能比 Log 表小得多的结构。

我相信查询一（join+distinct）只需要对表日志进行一次传递。

distinct 可能在内存中执行。

The second query probably forces the database to do multiples fulls reads on table Log.

第二个查询可能会强制数据库对表 Log 执行多次完整读取。

So in the second query you probably get:

所以在第二个查询中你可能会得到：

For each row in table user read all the rows in table Log (from disk) in order to match the condition.

对于表中的每一行，用户读取表 Log（从磁盘）中的所有行以匹配条件。

You have also to consider that some query may experience a dramatic diference in speed due to changes in memory availability, load and table increase.

您还必须考虑到，由于内存可用性、负载和表增加的变化，某些查询的速度可能会出现巨大差异。

oracle 选择不同的 ... 内连接 vs. 选择 ... where id in (...)

提问by Tonio

采纳答案by APC

回答by Peter Lang

回答by Klinger

相关推荐

最近更新

标签

oracle 选择不同的 ... 内连接 vs. 选择 ... where id in (...)

提问by Tonio

采纳答案by APC

回答by Peter Lang

回答by Klinger

相关推荐

Oracle - 从 DATE 数据类型字段中获取以毫秒为单位的日期/时间

Oracle 连接 URL 中的默认架构

oracle 如何确定现有表空间的 MAXSIZE

oracle 很长的 SQL 连接打开时间

相关推荐

最近更新

标签