oracle Sql:优化 BETWEEN 子句

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1947693/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 19:43:28  来源:igfitidea点击:

Sql: Optimizing BETWEEN clause

sqloracleoracle10g

提问by user235693

I wrote a statement that takes almost an hour to run so I am asking help so I can get to do this faster. So here we go:

我写了一个需要将近一个小时才能运行的声明,所以我寻求帮助,以便我可以更快地完成这项工作。所以我们开始:

I am making an inner join of two tables :

我正在对两个表进行内部连接:

I have many time intervals represented by intervals and i want to get measure datas from measures only within those intervals.

我有许多由间隔表示的时间间隔,我只想从这些间隔内的度量中获取度量数据。

intervals: has two columns, one is the starting time, the other the ending time of the interval (number of rows = 1295)

intervals: 有两列,一列是开始时间,另一列是区间的结束时间(行数=1295)

measures: has two columns, one with the measure, the other with the time the measure has been made (number of rows = one million)

measures: 有两列,一列是度量,另一列是度量的时间(行数 = 一百万)

The result I want to get is a table with in the first column the measure, then the time the measure has been done, the begin/end time of the considered interval (it would be repeated for row with a time within the considered range)

我想要得到的结果是一个表格,第一列是度量,然后是度量完成的时间,所考虑间隔的开始/结束时间(对于时间在所考虑范围内的行,它会重复)

Here is my code:

这是我的代码:

select measures.measure as measure, measures.time as time, intervals.entry_time as entry_time, intervals.exit_time as exit_time
    from
    intervals
    inner join  
    measures
    on  intervals.entry_time<=measures.time  and measures.time <=intervals.exit_time  
    order by time asc

Thanks

谢谢

回答by Quassnoi

This is quite a common problem.

这是一个很常见的问题。

Plain B-Treeindexes are not good for the queries like this:

普通B-Tree索引不适合这样的查询:

SELECT  measures.measure as measure,
        measures.time as time,
        intervals.entry_time as entry_time,
        intervals.exit_time as exit_time
FROM    intervals
JOIN    measures
ON      measures.time BETWEEN intervals.entry_time AND intervals.exit_time
ORDER BY
        time ASC

An index is good for searching the values within the given bounds, like this:

索引适用于搜索给定范围内的值,如下所示:

, but not for searching the bounds containing the given value, like this:

,但不是用于搜索包含给定值的边界,如下所示:

This article in my blog explains the problem in more detail:

我博客中的这篇文章更详细地解释了这个问题:

(the nested sets model deals with the similar type of predicate).

(嵌套集模型处理类似类型的谓词)。

You can make the index on time, this way the intervalswill be leading in the join, the ranged time will be used inside the nested loops. This will require sorting on time.

您可以在 上创建索引time,这样intervals将在连接中领先,范围时间将在嵌套循环内使用。这将需要对 进行排序time

You can create a spatial index on intervals(available in MySQLusing MyISAMstorage) that would include startand endin one geometry column. This way, measurescan lead in the join and no sorting will be needed.

您可以在intervals(可用于MySQL使用MyISAM存储)上创建一个空间索引,该索引将包含start和包含end在一个几何列中。这样,measures可以在连接中引入并且不需要排序。

The spatial indexes, however, are more slow, so this will only be efficient if you have few measures but many intervals.

但是,空间索引更慢,因此只有当您的度量很少但间隔很多时,这才会有效。

Since you have few intervals but many measures, just make sure you have an index on measures.time:

由于您的间隔很少,但度量很多,只需确保您有一个索引measures.time

CREATE INDEX ix_measures_time ON measures (time)

Update:

更新:

Here's a sample script to test:

这是要测试的示例脚本:

BEGIN
        DBMS_RANDOM.seed(20091223);
END;
/

CREATE TABLE intervals (
        entry_time NOT NULL,
        exit_time NOT NULL
)
AS
SELECT  TO_DATE('23.12.2009', 'dd.mm.yyyy') - level,
        TO_DATE('23.12.2009', 'dd.mm.yyyy') - level + DBMS_RANDOM.value
FROM    dual
CONNECT BY
        level <= 1500
/

CREATE UNIQUE INDEX ux_intervals_entry ON intervals (entry_time)
/

CREATE TABLE measures (
        time NOT NULL,
        measure NOT NULL
)
AS
SELECT  TO_DATE('23.12.2009', 'dd.mm.yyyy') - level / 720,
        CAST(DBMS_RANDOM.value * 10000 AS NUMBER(18, 2))
FROM    dual
CONNECT BY
        level <= 1080000
/

ALTER TABLE measures ADD CONSTRAINT pk_measures_time PRIMARY KEY (time)
/

CREATE INDEX ix_measures_time_measure ON measures (time, measure)
/

This query:

这个查询:

SELECT  SUM(measure), AVG(time - TO_DATE('23.12.2009', 'dd.mm.yyyy'))
FROM    (
        SELECT  *
        FROM    (
                SELECT  /*+ ORDERED USE_NL(intervals measures) */
                        *
                FROM    intervals
                JOIN    measures
                ON      measures.time BETWEEN intervals.entry_time AND intervals.exit_time
                ORDER BY
                        time
                )
        WHERE   rownum <= 500000
        )

uses NESTED LOOPSand returns in 1.7seconds.

使用NESTED LOOPS并在1.7几秒钟内返回。

This query:

这个查询:

SELECT  SUM(measure), AVG(time - TO_DATE('23.12.2009', 'dd.mm.yyyy'))
FROM    (
        SELECT  *
        FROM    (
                SELECT  /*+ ORDERED USE_MERGE(intervals measures) */
                        *
                FROM    intervals
                JOIN    measures
                ON      measures.time BETWEEN intervals.entry_time AND intervals.exit_time
                ORDER BY
                        time
                )
        WHERE   rownum <= 500000
        )

uses MERGE JOINand I had to stop it after 5minutes.

使用MERGE JOIN,我不得不在5几分钟后停止它。

Update 2:

更新 2:

You will most probably need to force the engine to use the correct table order in the join using a hint like this:

您很可能需要使用如下提示强制引擎在连接中使用正确的表顺序:

SELECT  /*+ LEADING (intervals) USE_NL(intervals, measures) */
        measures.measure as measure,
        measures.time as time,
        intervals.entry_time as entry_time,
        intervals.exit_time as exit_time
FROM    intervals
JOIN    measures
ON      measures.time BETWEEN intervals.entry_time AND intervals.exit_time
ORDER BY
        time ASC

The Oracle's optimizer is not smart enough to see that the intervals do not intersect. That's why it will most probably use measuresas a leading table (which would be a wise decision should the intervals intersect).

Oracle的优化是不够聪明地看到,间隔不相交。这就是为什么它最有可能measures用作前导表的原因(如果间隔相交,这将是一个明智的决定)。

Update 3:

更新 3:

WITH    splits AS
        (
        SELECT  /*+ MATERIALIZE */
                entry_range, exit_range,
                exit_range - entry_range + 1 AS range_span,
                entry_time, exit_time
        FROM    (
                SELECT  TRUNC((entry_time - TO_DATE(1, 'J')) * 2) AS entry_range,
                        TRUNC((exit_time - TO_DATE(1, 'J')) * 2) AS exit_range,
                        entry_time,
                        exit_time
                FROM    intervals
                )
        ),
        upper AS
        (
        SELECT  /*+ MATERIALIZE */
                MAX(range_span) AS max_range
        FROM    splits
        ),
        ranges AS
        (
        SELECT  /*+ MATERIALIZE */
                level AS chunk
        FROM    upper
        CONNECT BY
                level <= max_range
        ),
        tiles AS
        (
        SELECT  /*+ MATERIALIZE USE_MERGE (r s) */
                entry_range + chunk - 1 AS tile,
                entry_time,
                exit_time
        FROM    ranges r
        JOIN    splits s
        ON      chunk <= range_span
        )
SELECT  /*+ LEADING(t) USE_HASH(m t) */
        SUM(LENGTH(stuffing))
FROM    tiles t
JOIN    measures m
ON      TRUNC((m.time - TO_DATE(1, 'J')) * 2) = tile
        AND m.time BETWEEN t.entry_time AND t.exit_time

This query splits the time axis into the ranges and uses a HASH JOINto join the measures and timestamps on the range values, with fine filtering later.

此查询将时间轴拆分为多个范围,并使用 aHASH JOIN连接范围值上的度量和时间戳,稍后进行精细过滤。

See this article in my blog for more detailed explanations on how it works:

有关其工作原理的更详细说明,请参阅我博客中的这篇文章:

回答by APC

To summarise: your query is running against the full set of MEASURES. It matches the time of each MEASURES record to an INTERVALS record. If the window of times spanned by INTERVALS is roughly similar to the window spanned by MEASURES then your query is also running against the full set of INTERVALS, otherwise it is running against a subset.

总结一下:您的查询是针对全套 MEASURES 运行的。它将每个 MEASURES 记录的时间与一个 INTERVALS 记录相匹配。如果 INTERVALS 跨越的时间窗口与 MEASURES 跨越的窗口大致相似,那么您的查询也针对整个 INTERVALS 集运行,否则它针对一个子集运行。

Why that matter is because it reduces your scope for tuning, as a full table scan is the likely to be the fastest way of getting all the rows. So, unless your real MEASURES or INTERVALS tables have a lot more columns than you give us, it is unlikely that any indexes will give much advantage.

之所以重要,是因为它缩小了您的调优范围,因为全表扫描可能是获取所有行的最快方式。因此,除非您真正的 MEASURES 或 INTERVALS 表的列数比您提供的多得多,否则任何索引都不太可能提供太多优势。

The possible strategies are:

可能的策略是:

  • no indexes at all
  • index on MEASURES (TIME,MEASURE)
  • index on MEASURES (TIME)
  • no index on MEASURES
  • index on INTERVALS (ENTRY_TIME, EXIT_TIME)
  • index on INTERVALS (ENTRY_TIME)
  • no index on INTERVALS
  • parallel query
  • 根本没有索引
  • 度量指标(时间,度量)
  • 测量指标(时间)
  • 没有测量指标
  • INTERVALS (ENTRY_TIME, EXIT_TIME) 上的索引
  • INTERVALS (ENTRY_TIME) 上的索引
  • INTERVALS 没有索引
  • 并行查询

I'm not going to present test cases for all the permutations, because the results are pretty much as we would expect.

我不打算展示所有排列的测试用例,因为结果几乎和我们预期的一样。

Here is the test data. As you can see I'm using slightly larger data sets. The INTERVALS window is bigger than the MEASURES windows but not by much. The intervals are 10000 seconds wide, and the measures are taken every 15 seconds.

下面是测试数据。如您所见,我使用的是稍大的数据集。INTERVALS 窗口比MEASURES 窗口大,但相差不大。间隔为 10000 秒宽,每 15 秒进行一次测量。

SQL> select min(entry_time), max(exit_time), count(*) from intervals;

MIN(ENTRY MAX(EXIT_   COUNT(*)
--------- --------- ----------
01-JAN-09 20-AUG-09       2001

SQL> select min(ts), max(ts), count(*) from measures;

MIN(TS)   MAX(TS)     COUNT(*)
--------- --------- ----------
02-JAN-09 17-JUN-09    1200001

SQL>

NBIn my test data I have presumed that INTERVAL records do not overlap. This has an important corrolary: a MEASURES record joins to only one INTERVAL.

注意在我的测试数据中,我假设 INTERVAL 记录不重叠。这有一个重要的推论:MEASURES 记录仅连接到一个 INTERVAL。

Benchmark

基准

Here is the benchmark with no indexes.

这是没有索引的基准。

SQL> exec dbms_stats.gather_table_stats(user, 'MEASURES', cascade=>true)

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(user, 'INTERVALS', cascade=>true)

PL/SQL procedure successfully completed.

SQL> set timing on
SQL> 
SQL> select m.measure
  2         , m.ts as "TIME"
  3         , i.entry_time
  4         , i.exit_time
  5  from
  6      intervals i
  7  inner join
  8      measures m
  9      on ( m.ts between  i.entry_time and i.exit_time )
 10  order by m.ts asc
 11  /

1200001 rows selected.

Elapsed: 00:05:37.03

SQL>

MEASURES tests

测量测试

Now let's build a unique index on INTERVALS (ENTRY_TIME, EXIT_TIME) and try out the various indexing strategies for MEASURES. First up, an index MEASURES TIME column only.

现在让我们在 INTERVALS (ENTRY_TIME, EXIT_TIME) 上建立一个唯一索引,并尝试针对 MEASURES 的各种索引策略。首先,只有一个索引 MEASURES TIME 列。

SQL> create index meas_idx on measures (ts)
  2  /

Index created.

SQL> exec dbms_stats.gather_table_stats(user, 'MEASURES', cascade=>true)

PL/SQL procedure successfully completed.

SQL> 
SQL> set autotrace traceonly exp
SQL> 
SQL> set timing on
SQL> 
SQL> select m.measure
  2         , m.ts as "TIME"
  3         , i.entry_time
  4         , i.exit_time
  5  from
  6      intervals i
  7  inner join
  8      measures m
  9      on ( m.ts between  i.entry_time and i.exit_time )
 10  order by m.ts asc
 11  /

1200001 rows selected.

Elapsed: 00:05:20.21

SQL>

Now, let us index MEASURES.TIME and MEASURE columns

现在,让我们索引 MEASURES.TIME 和 MEASURE 列

SQL> drop  index meas_idx
  2  /

Index dropped.

SQL> create index meas_idx on measures (ts, measure)
  2  /

Index created.

SQL> exec dbms_stats.gather_table_stats(user, 'MEASURES', cascade=>true)

PL/SQL procedure successfully completed.


SQL> select m.measure
  2         , m.ts as "TIME"
  3         , i.entry_time
  4         , i.exit_time
  5  from
  6      intervals i
  7  inner join
  8      measures m
  9      on ( m.ts between  i.entry_time and i.exit_time )
 10  order by m.ts asc
 11  /

1200001 rows selected.

Elapsed: 00:05:28.54

SQL>

Now with no index on MEASURES (but still an index on INTERVALS)

现在没有关于 MEASURES 的索引(但仍然是关于 INTERVALS 的索引)

SQL> drop  index meas_idx
  2  /

Index dropped.

SQL> exec dbms_stats.gather_table_stats(user, 'MEASURES', cascade=>true)

PL/SQL procedure successfully completed.

SQL> select m.measure
  2         , m.ts as "TIME"
  3         , i.entry_time
  4         , i.exit_time
  5  from
  6      intervals i
  7  inner join
  8      measures m
  9      on ( m.ts between  i.entry_time and i.exit_time )
 10  order by m.ts asc
 11  /

1200001 rows selected.

Elapsed: 00:05:24.81

SQL> 

So what difference does parallel query make ?

那么并行查询有什么区别呢?

SQL> select /*+ parallel (4) */
  2         m.measure
  3         , m.ts as "TIME"
  4         , i.entry_time
  5         , i.exit_time
  6  from
  7      intervals i
  8  inner join
  9      measures m
 10      on ( m.ts between  i.entry_time and i.exit_time )
 11  order by m.ts asc
 12  /

1200001 rows selected.

Elapsed: 00:02:33.82


SQL>

MEASURES Conclusion

措施 结论

Not much difference in the elapsed time for the different indexes. I was slightly surprised that building an index on MEASURES (TS, MEASURE) resulted in a full table scan and a somewhat slower execution time. On the other hand, it is no surprise that running in parallel query is much faster. So if you have Enterprise Edition and you have the CPUs to spare, using PQ will definitely reduce the elapsed time, although it won't change the resource costs much (and actually does a lotmore sorting).

不同索引的运行时间差别不大。我有点惊讶在 MEASURES (TS, MEASURE) 上建立索引会导致全表扫描和执行时间稍慢。另一方面,并​​行查询运行速度更快也就不足为奇了。所以,如果你有企业版和你有CPU来备用,使用PQ肯定会减少消耗的时间,但它不会改变资源成本远(实际上做了很多更排序)。

INTERVALS tests

间隔测试

So what difference might the various indexes on INTERVALS make? In the following tests we will retain an index on MEASURES (TS). First of all we will drop the primary key on both INTERVALS columns and replace it with a constraint on INTERVALS (ENTRY_TIME) only.

那么 INTERVALS 上的各种索引有什么不同呢?在接下来的测试中,我们将保留一个关于 MEASURES (TS) 的索引。首先,我们将删除两个 INTERVALS 列上的主键,并将其替换为仅对 INTERVALS (ENTRY_TIME) 的约束。

SQL> alter table intervals drop  constraint ivl_pk drop  index
  2  /

Table altered.

SQL> alter table intervals add  constraint ivl_pk primary key (entry_time) using  index
  2  /

Table altered.

SQL> exec dbms_stats.gather_table_stats(user, 'INTERVALS', cascade=>true)

PL/SQL procedure successfully completed.


SQL> select m.measure
  2         , m.ts as "TIME"
  3         , i.entry_time
  4         , i.exit_time
  5  from
  6      intervals i
  7  inner join
  8      measures m
  9      on ( m.ts between  i.entry_time and i.exit_time )
 10  order by m.ts asc
 11  /

1200001 rows selected.

Elapsed: 00:05:38.39

SQL> 

Lastly with no index on INTERVALS at all

最后完全没有 INTERVALS 索引

SQL> alter table intervals drop  constraint ivl_pk drop  index
  2  /

Table altered.

SQL> exec dbms_stats.gather_table_stats(user, 'INTERVALS', cascade=>true)

PL/SQL procedure successfully completed.

SQL> select m.measure
  2         , m.ts as "TIME"
  3         , i.entry_time
  4         , i.exit_time
  5  from
  6      intervals i
  7  inner join
  8      measures m
  9      on ( m.ts between  i.entry_time and i.exit_time )
 10  order by m.ts asc
 11  /

1200001 rows selected.

Elapsed: 00:05:29.15

SQL> 

INTERVALS conclusion

INTERVALS 结论

The index on INTERVALS makes a slight difference. That is, indexing (ENTRY_TIME, EXIT_TIME) results in a faster execution. This is because it permist a fast full index scan rather than a full table scan. This would be more significant if the time window delineated by INTERVALS was considerably wider than that of MEASURES.

INTERVALS 上的索引略有不同。也就是说,索引 (ENTRY_TIME, EXIT_TIME) 会导致执行速度更快。这是因为它允许快速全索引扫描而不是全表扫描。如果 INTERVALS 描绘的时间窗口比 MEASURES 的时间窗口宽得多,这将更加重要。

Overall Conclusions

总体结论

Because we are doing full table queries none of the indexes substantially changed the execution time. So if you have Enterprise Edition and multiple CPUs Parallel Query will give you the best results. Otherwise the most best indexes would be INTERVALS(ENTRY_TIME, EXIT_TIME) and MEASURES(TS) .The Nested Loops solution is definitely faster than Parallel Query - see Edit 4below.

因为我们正在执行全表查询,所以没有任何索引显着改变了执行时间。所以如果你有企业版和多 CPU 并行查询会给你最好的结果。否则最好的索引将是 INTERVALS(ENTRY_TIME, EXIT_TIME) 和 MEASURES(TS) 。嵌套循环解决方案绝对比并行查询快 - 请参阅下面的编辑 4

If you were running against a subset of MEASURES (say a week's worth) then the presence of indexes would have a bigger impact, It is likely that the two I recommended in the previous paragraph would remain the most effective,

如果您针对的是 MEASURES 的一个子集(比如一周的价值),那么索引的存在会产生更大的影响,我在上一段中推荐的两个可能仍然是最有效的,

Last observation: I ran this on a bog standard dual core laptop with an SGA of just 512M. Yet all of my queries took less than six minutes. If your query really takes an hour then your database has some serious problems. Although this long running time could be an artefact of overlapping INTERVALS, which could result in a cartesian product.

最后观察:我在 SGA 仅为 512M 的沼泽标准双核笔记本电脑上运行此程序。然而,我所有的查询都用了不到六分钟。如果您的查询确实需要一个小时,那么您的数据库就有一些严重的问题。虽然这么长的运行时间可能是重叠 INTERVALS 的人工制品,这可能会导致笛卡尔积。

**Edit **

**编辑 **

Originally I included the output from

最初我包括了输出

SQL> set autotrace traceonly stat exp

But alas SO severelytruncated my post. So I have rewritten it but without execution or stats. Those who wish to validate my findings will have to run the queries themselevs.

但唉, 严重截断了我的帖子。所以我重写了它,但没有执行或统计。那些希望验证我的发现的人必须自己运行查询。

Edit 4 (previous edit's removed for reasons of space)

编辑 4(由于空间原因删除了之前的编辑)

At the third attempt I have been able to reproduce teh performance improvement for Quassnoi's solution.

在第三次尝试中,我已经能够重现 Quassnoi 解决方案的性能改进。

SQL> set autotrace traceonly stat exp
SQL>
SQL> set timing on
SQL>
SQL> select
  2          /*+ LEADING (i) USE_NL(i, m) */
  3              m.measure
  4             , m.ts as "TIME"
  5             , i.entry_time
  6             , i.exit_time
  7  from
  8      intervals i
  9  inner join
 10      measures m
 11      on ( m.ts between  i.entry_time and i.exit_time )
 12  order by m.ts asc
 13  /

1200001 rows selected.

Elapsed: 00:00:18.39

Execution Plan
----------------------------------------------------------
Plan hash value: 974071908

---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |           |  6003K|   257M|       |   973K  (1)| 03:14:46 |
|   1 |  SORT ORDER BY                |           |  6003K|   257M|   646M|   973K  (1)| 03:14:46 |
|   2 |   NESTED LOOPS                |           |       |       |       |            |          |
|   3 |    NESTED LOOPS               |           |  6003K|   257M|       |   905K  (1)| 03:01:06 |
|   4 |     TABLE ACCESS FULL         | INTERVALS |  2001 | 32016 |       |  2739   (1)| 00:00:33 |
|*  5 |     INDEX RANGE SCAN          | MEAS_IDX  | 60000 |       |       |   161   (1)| 00:00:02 |
|   6 |    TABLE ACCESS BY INDEX ROWID| MEASURES  |  3000 | 87000 |       |   451   (1)| 00:00:06 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("M"."TS">="I"."ENTRY_TIME" AND "M"."TS"<="I"."EXIT_TIME")


Statistics
----------------------------------------------------------
         66  recursive calls
          2  db block gets
      21743  consistent gets
      18175  physical reads
          0  redo size
   52171689  bytes sent via SQL*Net to client
     880416  bytes received via SQL*Net from client
      80002  SQL*Net roundtrips to/from client
          0  sorts (memory)
          1  sorts (disk)
    1200001  rows processed

SQL> 

So Nested Loops are definitely the way to go.

所以嵌套循环绝对是要走的路。

Useful lessons from the exercise

练习中的有用教训

  1. Running diagnostic tests is far more valuable than guessing and theorising
  2. Understanding the data is crucial
  3. Even with 11g we still soemtimes need to use hints to prod the optimizer in certain cases
  1. 运行诊断测试比猜测和理论更有价值
  2. 理解数据至关重要
  3. 即使使用 11g,我们有时仍然需要在某些情况下使用提示来刺激优化器

回答by SqlRyan

The first thing I do is have your database tool generate an execution plan that you can view (this is "Control-L" in MSSQL, but I'm not sure how to do it in Oracle) - that will try to point out the slow parts and, depending on your Server/Editor, it may even recommend some basic indexes. Once you have an execution plan, you can look for any table scans of inner loop joins, both of which are really slow - indexes can help with table scans, and you can add additional join predicates to help alleviate loop joins.

我做的第一件事是让您的数据库工具生成一个您可以查看的执行计划(这是 MSSQL 中的“Control-L”,但我不确定如何在 Oracle 中执行此操作)- 这将尝试指出缓慢的部分,根据您的服务器/编辑器,它甚至可能会推荐一些基本索引。一旦你有了一个执行计划,你就可以寻找任何内循环连接的表扫描,这两者都非常慢——索引可以帮助进行表扫描,你可以添加额外的连接谓词来帮助缓解循环连接。

My guess would be the MEASURES needs an index on the TIME column, and you can include the MEASURE column as well to speed lookups. Try this:

我的猜测是 MEASURES 需要在 TIME 列上建立索引,您也可以包含 MEASURE 列以加快查找速度。尝试这个:

CREATE INDEX idxMeasures_Time ON Measures ([Time]) INCLUDES (Measure)

Also, though this won't change your execution plan or speed up your query, it may make your join clause a bit easier read:

此外,虽然这不会改变您的执行计划或加速您的查询,但它可能会使您的 join 子句更容易阅读:

ON measures.time BETWEEN intervals.entry_time AND intervals.exit_time

This just combines your two <= and >= into a single statement.

这只是将您的两个 <= 和 >= 组合成一个语句。

回答by marc_s

You can't really optimize your statement - it's pretty simple as it is.

你不能真正优化你的语句——它很简单。

What you coulddo is investigate if some indices would help you.

可以做的是调查一些指数是否对你有帮助。

You're selecting on intervals.entry_time, intervals.exit_time, measures.time- are those columns indexed?

您正在选择intervals.entry_time, intervals.exit_time, measures.time- 这些列是否已编入索引?

回答by redcayuga

try a parallel query

尝试并行查询

  alter session enable parallel query;

  select /*+ parallel */ ... same as before;

You could also create a materialized view, perhaps with the parallel hint above. It may take a long time to create the MV but once created it can be queried repeatedly.

您还可以创建一个物化视图,也许使用上面的并行提示。创建MV可能需要很长时间,但创建后可以重复查询。

回答by Charles Bretana

Your SQL is equivalent to:

您的 SQL 相当于:

select m.measure. m.time, 
     i.entry_time, i.exit_time
from intervals i
    join measures m
        on m.time Between i.entry_time And i.exit_time  
order by time asc

The only thing I might suggest is making sure there's an index on m.Time. Then if that doesn't improve performance enough, try adding indices on i.Start_Time and i.End_Time as well

我唯一可能建议的是确保在 m.Time 上有一个索引。然后,如果这不能充分提高性能,请尝试在 i.Start_Time 和 i.End_Time 上添加索引

回答by David Aldridge

There may be a very efficient way of writing this query if the intervals are deterministic because the query could be converted to an equi-join that would be amenable to more efficient hash joining.

如果间隔是确定性的,则可能有一种非常有效的方式来编写此查询,因为该查询可以转换为适合更高效散列连接的 equi-join。

For example if the intervals are all hourly:

例如,如果间隔都是每小时:

ENTRY_TIME          EXIT_TIME
2000-01-15 09:00:00 2000-01-15 09:59:59
2000-01-15 10:00:00 2000-01-15 10:59:59
2000-01-15 11:00:00 2000-01-15 11:59:59
2000-01-15 12:00:00 2000-01-15 12:59:59
....

Then the join can be written as:

那么连接可以写成:

intervals.entry_time=trunc(measures.time,'HH')

This would reduce the cost of everything up to and including the join pretty much to a full scan of each of the tables.

这将减少所有的成本,包括连接到对每个表的全面扫描。

However, since you have the ORDER BY operation in there, I think that a sort-merge might still beat it as the query is written right now because the optimiser will sort a smaller data set for the sort-merge than it would for the hash join (because in the latter case it would have to sort more columns of data). you could get round this by structuring the query as:

但是,由于您在那里有 ORDER BY 操作,我认为排序合并可能仍然会在现在编写查询时击败它,因为优化器将为排序合并对比散列更小的数据集进行排序join(因为在后一种情况下,它必须对更多的数据列进行排序)。您可以通过将查询构建为以下内容来解决此问题:

select
  measures.measure     as measure,
  measures.time        as time,
  intervals.entry_time as entry_time,
  intervals.exit_time  as exit_time
from
  intervals inner join  
  (select time, measure from measures order by time) measures
  on  intervals.entry_time=trunc(measures.time,'HH')  
/

This gives a lower cost estimate than a sort-merge on my 10.2.0.4 test instance but I'd regard it as a little risky.

这比我的 10.2.0.4 测试实例上的排序合并提供了更低的成本估计,但我认为它有点冒险。

So, I'd look for a sort-merge or rewrite it to allow the use of a hash join if possible.

因此,如果可能,我会寻找排序合并或重写它以允许使用散列连接。

回答by Jeffrey Kemp

You're pretty much going to get most of the rows from both tables in this case, plus you've got a sort.

在这种情况下,您几乎会从两个表中获取大部分行,而且您已经得到了一个排序。

The question is, does the calling process really need all the rows, or just the first few? This would change how I'd go about optimising the query.

问题是,调用进程真的需要所有行,还是只需要前几行?这将改变我优化查询的方式。

I'll assume your calling process wants ALL the rows. Since the join predicate is not on an equality, I'd say a MERGE JOIN may be the best approach to aim for. A merge join requires its data sources to be sorted, so if we can avoid a sort the query should run as fast as it possibly can (barring more interesting approaches such as specialised indexes or materialized views).

我假设您的调用过程需要所有行。由于连接谓词不是等号,我想说 MERGE JOIN 可能是最好的方法。合并连接需要对其数据源进行排序,因此如果我们可以避免排序,则查询应该尽可能快地运行(除非有更有趣的方法,例如专用索引或物化视图)。

To avoid the SORT operations on intervalsand measures, you could add indexes on (measures.time,measures.measure) and (intervals.entry_time, intervals.exit_time). The database can use the index to avoid a sort, and it'll be faster because it doesn't have to visit any table blocks.

为避免对intervalsand进行排序操作measures,您可以在 ( measures.time, measures.measure) 和 ( intervals.entry_time, intervals.exit_time)上添加索引。数据库可以使用索引来避免排序,而且速度会更快,因为它不必访问任何表块。

Alternatively, if you only have an index on measures.time, the query may still run ok without adding another big index - it'll run slower though because it'll probably have to read many table blocks to get the measures.measurefor the SELECT clause.

或者,如果您只有一个索引measures.time,则查询可能仍然可以正常运行而无需添加另一个大索引 - 但它会运行得更慢,因为它可能必须读取许多表块来获取measures.measureSELECT 子句。

回答by Jeremy S

Not knowing what database system and version, I'd say that (lack of) indexing and the join clause could be causing the problem.

不知道什么数据库系统和版本,我会说(缺乏)索引和连接子句可能导致问题。

For every record in the measure table, you can have multiple records in the interval table (intervals.entry_time<=measures.time), and for every record in the interval table, you can have multiple records in measure (measures.time <=intervals.exit_time). the resulting one-to-many and many-to one relationships cause by the join means multiple table scans for each record. I doubt that Cartesian Product is the correct term, but it's pretty close.

对于 measure 表中的每条记录,可以在区间表 ( intervals.entry_time<=measures.time) 中拥有多条记录,对于区间表中的每条记录,可以在 measure 中拥有多条记录 ( measures.time <=intervals.exit_time)。由连接导致的一对多和多对一关系意味着对每条记录进行多次表扫描。我怀疑笛卡尔积是正确的术语,但它非常接近。

Indexing would definitely help, but it would help even more if you could find a better key to join the two tables. having the one-to-many relationships going in one direction only would definitely speed up the processing as it wouldn't have to scan each table/index twice for each record.

索引肯定会有所帮助,但如果您能找到一个更好的键来连接两个表,它会更有帮助。使一对多关系只在一个方向上进行肯定会加快处理速度,因为它不必为每条记录扫描每个表/索引两次。