即使两个表都很大,Oracle 总是使用 HASH JOIN 吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8188093/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 03:43:02  来源:igfitidea点击:

Oracle always uses HASH JOIN even when both tables are huge?

oraclejoin

提问by teddy teddy

my understanding is that HASH JOIN only makes sense when one of the 2 tables is small enough to fit into memory as a hash table.

我的理解是,只有当两个表中的一个小到足以作为哈希表放入内存时,HASH JOIN 才有意义。

but when I gave a query to oracle, with both tables having several hundred million rows, oracle still came up with a hash join explain plan. even when I tricked it with OPT_ESTIMATE(rows = ....) hints, it always decides to use HASH JOIN instead of merge sort join.

但是当我向 oracle 查询时,两个表都有几亿行,oracle 仍然想出了一个散列连接解释计划。即使我用 OPT_ESTIMATE(rows = ....) 提示欺骗了它,它也总是决定使用 HASH JOIN 而不是合并排序连接。

so I wonder how is HASH JOIN possible in the case of both tables being very large?

所以我想知道在两个表都非常大的情况下如何进行 HASH JOIN?

thanks Yang

谢谢杨

回答by Jon Heller

Hash joins obviously work best when everything can fit in memory. But that does not mean they are not still the best join method when the table can't fit in memory. I think the only other realistic join method is a merge sort join.

当一切都可以放入内存时,哈希连接显然效果最好。但这并不意味着当表无法放入内存时,它们仍然不是最好的连接方法。我认为唯一的其他现实连接方法是合并排序连接。

If the hash table can't fit in memory, than sorting the table for the merge sort join can't fit in memory either. And the merge join needs to sort both tables. In my experience, hashing is always faster than sorting, for joining and for grouping.

如果哈希表不能放入内存,那么为合并排序连接对表进行排序也不能放入内存。合并连接需要对两个表进行排序。根据我的经验,对于加入和分组,散列总是比排序快。

But there are some exceptions. From the Oracle? Database Performance Tuning Guide, The Query Optimizer:

但也有一些例外。从甲骨文?数据库性能调优指南,查询优化器

Hash joins generally perform better than sort merge joins. However, sort merge joins can perform better than hash joins if both of the following conditions exist:

  The row sources are sorted already.
  A sort operation does not have to be done.

散列连接通常比排序合并连接性能更好。但是,如果同时存在以下两个条件,则排序合并连接的性能会比散列连接更好:

  The row sources are sorted already.
  A sort operation does not have to be done.


Test

测试

Instead of creating hundreds of millions of rows, it's easier to force Oracle to only use a very small amount of memory.

与创建数亿行相比,强制 Oracle 仅使用非常少量的内存更容易。

This chart shows that hash joins outperform merge joins, even when the tables are too large to fit in (artificially limited) memory:

此图表显示散列连接的性能优于合并连接,即使表太大而无法放入(人为限制的)内存中:

Hash vs Merge

哈希与合并



Notes

笔记

For performance tuning it's usually better to use bytes than number of rows. But the "real" size of the table is a difficult thing to measure, which is why the chart displays rows. The sizes go approximately from 0.375 MB up to 14 MB. To double-check that these queries are really writing to disk you can run them with /*+ gather_plan_statistics */ and then query v$sql_plan_statistics_all.

对于性能调整,使用字节数通常比使用行数更好。但是表格的“真实”大小很难衡量,这就是图表显示行的原因。大小大约从 0.375 MB 到 14 MB。要仔细检查这些查询是否真的写入磁盘,您可以使用 /*+ gather_plan_statistics */ 运行它们,然后查询 v$sql_plan_statistics_all。

I only tested hash joins vs merge sort joins. I didn't fully test nested loops because that join method is always incredibly slow with large amounts of data. As a sanity check, I did compare it once with the last data size, and it took at least several minutes before I killed it.

我只测试了哈希连接与合并排序连接。我没有完全测试嵌套循环,因为这种连接方法对于大量数据总是非常慢。作为完整性检查,我确实将它与上次数据大小进行了一次比较,并且在我杀死它之前至少花了几分钟。

I also tested with different _area_sizes, ordered and unordered data, and different distinctness of the join column (more matches is more CPU-bound, less matches is more IO bound), and got relatively similar results.

我还测试了不同的 _area_sizes、有序和无序数据以及连接列的不同差异性(匹配越多 CPU 限制越多,匹配越少越 IO 限制),得到了相对相似的结果。

However, the results were different when the amount of memory was ridiculously small. With only 32K sort|hash_area_size, merge sort join was significantly faster. But if you have so little memory you probably have more significant problems to worry about.

但是,当内存量小得离谱时,结果就不同了。只有 32K sort|hash_area_size,合并排序连接明显更快。但是,如果您的记忆力如此之少,您可能需要担心更多重大问题。

There are still many other variables to consider, such as parallelism, hardware, bloom filters, etc. People have probably written books on this subject, I haven't tested even a small fraction of the possibilities. But hopefully this is enough to confirm the general consensus that hash joins are best for large data.

还有许多其他变量需要考虑,例如并行性、硬件、布隆过滤器等。人们可能已经写过关于这个主题的书,我什至没有测试过一小部分可能性。但希望这足以证实散列连接最适合大数据的普遍共识。



Code

代码

Below are the scripts I used:

以下是我使用的脚本:

--Drop objects if they already exist
drop table test_10k_rows purge;
drop table test1 purge;
drop table test2 purge;

--Create a small table to hold rows to be added.
--("connect by" would run out of memory later when _area_sizes are small.)
--VARIABLE: More or less distinct values can change results.  Changing
--"level" to something like "mod(level,100)" will result in more joins, which
--seems to favor hash joins even more.
create table test_10k_rows(a number, b number, c number, d number, e number);
insert /*+ append */ into test_10k_rows
    select level a, 12345 b, 12345 c, 12345 d, 12345 e
    from dual connect by level <= 10000;
commit;

--Restrict memory size to simulate running out of memory.
alter session set workarea_size_policy=manual;

--1 MB for hashing and sorting
--VARIABLE: Changing this may change the results.  Setting it very low,
--such as 32K, will make merge sort joins faster.
alter session set hash_area_size = 1048576;
alter session set sort_area_size = 1048576;

--Tables to be joined
create table test1(a number, b number, c number, d number, e number);
create table test2(a number, b number, c number, d number, e number);

--Type to hold results
create or replace type number_table is table of number;

set serveroutput on;

--
--Compare hash and merge joins for different data sizes.
--
declare
    v_hash_seconds number_table := number_table();
    v_average_hash_seconds number;
    v_merge_seconds number_table := number_table();
    v_average_merge_seconds number;

    v_size_in_mb number;
    v_rows number;
    v_begin_time number;
    v_throwaway number;

    --Increase the size of the table this many times
    c_number_of_steps number := 40;
    --Join the tables this many times
    c_number_of_tests number := 5;

begin
    --Clear existing data
    execute immediate 'truncate table test1';
    execute immediate 'truncate table test2';

    --Print headings.  Use tabs for easy import into spreadsheet.
    dbms_output.put_line('Rows'||chr(9)||'Size in MB'
        ||chr(9)||'Hash'||chr(9)||'Merge');

    --Run the test for many different steps
    for i in 1 .. c_number_of_steps loop
        v_hash_seconds.delete;
        v_merge_seconds.delete;
        --Add about 0.375 MB of data (roughly - depends on lots of factors)
        --The order by will store the data randomly.
        insert /*+ append */ into test1
        select * from test_10k_rows order by dbms_random.value;

        insert /*+ append */ into test2
        select * from test_10k_rows order by dbms_random.value;

        commit;

        --Get the new size
        --(Sizes may not increment uniformly)
        select bytes/1024/1024 into v_size_in_mb
        from user_segments where segment_name = 'TEST1';

        --Get the rows.  (select from both tables so they are equally cached)
        select count(*) into v_rows from test1;
        select count(*) into v_rows from test2; 

        --Perform the joins several times
        for i in 1 .. c_number_of_tests loop
            --Hash join
            v_begin_time := dbms_utility.get_time;
            select /*+ use_hash(test1 test2) */ count(*) into v_throwaway
            from test1 join test2 on test1.a = test2.a;
            v_hash_seconds.extend;
            v_hash_seconds(i) := (dbms_utility.get_time - v_begin_time) / 100;

            --Merge join
            v_begin_time := dbms_utility.get_time;
            select /*+ use_merge(test1 test2) */ count(*) into v_throwaway
            from test1 join test2 on test1.a = test2.a;
            v_merge_seconds.extend;
            v_merge_seconds(i) := (dbms_utility.get_time - v_begin_time) / 100;
        end loop;

        --Get average times.  Throw out first and last result.
        select ( sum(column_value) - max(column_value) - min(column_value) ) 
            / (count(*) - 2)
        into v_average_hash_seconds
        from table(v_hash_seconds);

        select ( sum(column_value) - max(column_value) - min(column_value) ) 
            / (count(*) - 2)
        into v_average_merge_seconds
        from table(v_merge_seconds);

        --Display size and times
        dbms_output.put_line(v_rows||chr(9)||v_size_in_mb||chr(9)
            ||v_average_hash_seconds||chr(9)||v_average_merge_seconds);

    end loop;
end;
/

回答by Quassnoi

So I wonder how is HASH JOIN possible in the case of both tables being very large?

所以我想知道在两个表都非常大的情况下如何进行 HASH JOIN ?

It would be done in multiple passes: the driven table is read and hashed in chunks, the leading table is scanned several times.

它将分多次完成:读取驱动表并分块散列,多次扫描前导表。

This means that with limited memory hash join scales at O(N^2)while merge joins scales at O(N)(with no sorting needed of course), and on reallylarge tables merge outperforms hash joins. However, the tables should be really large so that benefits of single read would outweight drawbacks of non-sequential access, and you would need all data from them (usually aggregated).

这意味着,有限的内存散在加入尺度O(N^2),而合并在加入尺度O(N)(无排序当然需要),并在真正大表合并性能优于哈希联接。但是,表应该非常大,以便单次读取的好处超过非顺序访问的缺点,并且您需要来自它们的所有数据(通常是聚合的)。

Given the RAMsized on modern servers, we are talking about really large reports on really large databases which take hours to build, not something you would really see in everyday live.

鉴于RAM现代服务器的大小,我们谈论的是关于真正大型数据库的真正大型报告,这些报告需要数小时才能构建,而不是您在日常生活中真正看到的。

MERGE JOINmay also be useful when the output recordset is limited with rownum < N. But this means that the joined inputs should be already sorted which means they both be indexed which means NESTED LOOPSis available too, and that's what is usually chosen by the optimizer, since this is more efficient when the join condition is selective.

MERGE JOIN当输出记录集受限于rownum < N. 但这意味着连接的输入应该已经排序,这意味着它们都被索引,这意味着NESTED LOOPS也可用,这通常由优化器选择,因为当连接条件是选择性时,这更有效。

With their current implementations, MERGE JOINalways scans and NESTED LOOPSalways seeks, while a more smart combination of both methods (backed up by statistics) would be preferred.

对于他们当前的实现,MERGE JOIN总是扫描和NESTED LOOPS总是寻找,而更智能的两种方法的组合(由统计数据支持)将是首选。

You may want to read this article in my blog:

您可能想在我的博客中阅读这篇文章:

回答by Chris

A hash join does not have to fit the whole table into memory, but only the rows which match the where conditions of that table (or even only a hash + the rowid - I'm not sure about that).

散列连接不必将整个表放入内存中,而只需将与该表的 where 条件匹配的行(或者甚至只有散列 + rowid - 我不确定)。

So when Oracle decides that the selectivity of the part of the where conditions affecting one of the tables is good enough (i.e. few rows will have to be hashed), it might prefer a hash join even for very large tables.

因此,当 Oracle 决定影响其中一个表的 where 条件部分的选择性足够好时(即,必须对少数行进行散列),即使对于非常大的表,它也可能更喜欢散列连接。