哪个 SQL 查询更快?筛选加入条件或 Where 子句?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2509987/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 05:47:39  来源:igfitidea点击:

Which SQL query is faster? Filter on Join criteria or Where clause?

sqlsql-servertsqlsql-server-2008

提问by Jon Erickson

Compare these 2 queries. Is it faster to put the filter on the join criteria or in the were clause. I have always felt that it is faster on the join criteria because it reduces the result set at the soonest possible moment, but I don't know for sure.

比较这两个查询。将过滤器放在连接条件上或放在 was 子句中是否更快。我一直觉得它在加入标准上更快,因为它会在尽可能快的时刻减少结果集,但我不确定。

I'm going to build some tests to see, but I also wanted to get opinions on which would is clearer to read as well.

我将建立一些测试来查看,但我也想就哪些更易于阅读获得意见。

Query 1

查询 1

SELECT      *
FROM        TableA a
INNER JOIN  TableXRef x
        ON  a.ID = x.TableAID
INNER JOIN  TableB b
        ON  x.TableBID = b.ID
WHERE       a.ID = 1            /* <-- Filter here? */

Query 2

查询 2

SELECT      *
FROM        TableA a
INNER JOIN  TableXRef x
        ON  a.ID = x.TableAID
        AND a.ID = 1            /* <-- Or filter here? */
INNER JOIN  TableB b
        ON  x.TableBID = b.ID

EDIT

编辑

I ran some tests and the results show that it is actually very close, but the WHEREclause is actually slightly faster! =)

我运行了一些测试,结果表明它实际上非常接近,但是WHERE子句实际上稍微快了一点!=)

I absolutely agree that it makes more sense to apply the filter on the WHEREclause, I was just curious as to the performance implications.

我完全同意在WHERE子句上应用过滤器更有意义,我只是对性能影响感到好奇。

ELAPSED TIME WHERE CRITERIA:143016 ms
ELAPSED TIME JOIN CRITERIA:143256 ms

已用时间标准:143016 毫秒
已用时间加入标准:143256 毫秒

TEST

测试

SET NOCOUNT ON;

DECLARE @num    INT,
        @iter   INT

SELECT  @num    = 1000, -- Number of records in TableA and TableB, the cross table is populated with a CROSS JOIN from A to B
        @iter   = 1000  -- Number of select iterations to perform

DECLARE @a TABLE (
        id INT
)

DECLARE @b TABLE (
        id INT
)

DECLARE @x TABLE (
        aid INT,
        bid INT
)

DECLARE @num_curr INT
SELECT  @num_curr = 1

WHILE (@num_curr <= @num)
BEGIN
    INSERT @a (id) SELECT @num_curr
    INSERT @b (id) SELECT @num_curr

    SELECT @num_curr = @num_curr + 1
END

INSERT      @x (aid, bid)
SELECT      a.id,
            b.id
FROM        @a a
CROSS JOIN  @b b

/*
    TEST
*/
DECLARE @begin_where    DATETIME,
        @end_where      DATETIME,
        @count_where    INT,
        @begin_join     DATETIME,
        @end_join       DATETIME,
        @count_join     INT,
        @curr           INT,
        @aid            INT

DECLARE @temp TABLE (
        curr    INT,
        aid     INT,
        bid     INT
)

DELETE FROM @temp

SELECT  @curr   = 0,
        @aid    = 50

SELECT  @begin_where = CURRENT_TIMESTAMP
WHILE (@curr < @iter)
BEGIN
    INSERT      @temp (curr, aid, bid)
    SELECT      @curr,
                aid,
                bid
    FROM        @a a
    INNER JOIN  @x x
            ON  a.id = x.aid
    INNER JOIN  @b b
            ON  x.bid = b.id
    WHERE       a.id = @aid

    SELECT @curr = @curr + 1
END
SELECT  @end_where = CURRENT_TIMESTAMP

SELECT  @count_where = COUNT(1) FROM @temp
DELETE FROM @temp

SELECT  @curr = 0
SELECT  @begin_join = CURRENT_TIMESTAMP
WHILE (@curr < @iter)
BEGIN
    INSERT      @temp (curr, aid, bid)
    SELECT      @curr,
                aid,
                bid
    FROM        @a a
    INNER JOIN  @x x
            ON  a.id = x.aid
            AND a.id = @aid
    INNER JOIN  @b b
            ON  x.bid = b.id

    SELECT @curr = @curr + 1
END
SELECT  @end_join = CURRENT_TIMESTAMP

SELECT  @count_join = COUNT(1) FROM @temp
DELETE FROM @temp

SELECT  @count_where AS count_where,
        @count_join AS count_join,
        DATEDIFF(millisecond, @begin_where, @end_where) AS elapsed_where,
        DATEDIFF(millisecond, @begin_join, @end_join) AS elapsed_join

采纳答案by Quassnoi

Performance-wise, they are the same (and produce the same plans)

性能方面,它们是相同的(并产生相同的计划)

Logically, you should make the operation that still has sense if you replace INNER JOINwith a LEFT JOIN.

从逻辑上讲,如果您替换INNER JOINLEFT JOIN.

In your very case this will look like this:

在您的情况下,这将如下所示:

SELECT  *
FROM    TableA a
LEFT JOIN
        TableXRef x
ON      x.TableAID = a.ID
        AND a.ID = 1
LEFT JOIN
        TableB b
ON      x.TableBID = b.ID

or this:

或这个:

SELECT  *
FROM    TableA a
LEFT JOIN
        TableXRef x
ON      x.TableAID = a.ID
LEFT JOIN
        TableB b
ON      b.id = x.TableBID
WHERE   a.id = 1

The former query will not return any actual matches for a.idother than 1, so the latter syntax (with WHERE) is logically more consistent.

前者查询不会返回任何实际比赛为a.id比其他1,因此后者的语法(有WHERE)在逻辑上是比较一致的。

回答by Remus Rusanu

For inner joins it doesn't matter where you put your criteria. The SQL compiler will transform both into an execution plan in which the filtering occurs below the join (ie. as if the filter expressions appears is in the join condition).

对于内部联接,将标准放在哪里并不重要。SQL 编译器会将两者都转换为一个执行计划,其中过滤发生在连接下方(即,过滤表达式出现在连接条件中)。

Outer joins are a different matter, since the place of the filter changes the semantics of the query.

外连接是另一回事,因为过滤器的位置会改变查询的语义。

回答by Robin Day

As far as the two methods go.

就这两种方法而言。

  • JOIN/ON is for joining tables
  • WHERE is for filtering results
  • JOIN/ON 用于连接表
  • WHERE 用于过滤结果

Whilst you can use them differently it always seems like a smell to me.

虽然你可以用不同的方式使用它们,但对我来说似乎总是一种气味。

Deal with performance when it is a problem. Then you can look into such "optimisations".

遇到问题时处理性能问题。然后你可以研究这样的“优化”。

回答by TomTom

With any query optimizer worh a cent.... they are identical.

对于任何值一分钱的查询优化器......它们是相同的。

回答by Peter Graham

In postgresql they are the same. We know this because if you do explain analyzeon each of the queries, the plan comes out to be the same. Take this example:

在 postgresql 中,它们是相同的。我们知道这一点,因为如果您explain analyze对每个查询都进行了操作,那么计划结果是相同的。拿这个例子:

# explain analyze select e.* from event e join result r on e.id = r.event_id and r.team_2_score=24;

                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=27.09..38.22 rows=7 width=899) (actual time=0.045..0.047 rows=1 loops=1)
   Hash Cond: (e.id = r.event_id)
   ->  Seq Scan on event e  (cost=0.00..10.80 rows=80 width=899) (actual time=0.009..0.010 rows=2 loops=1)
   ->  Hash  (cost=27.00..27.00 rows=7 width=8) (actual time=0.017..0.017 rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on result r  (cost=0.00..27.00 rows=7 width=8) (actual time=0.006..0.008 rows=1 loops=1)
               Filter: (team_2_score = 24)
               Rows Removed by Filter: 1
 Planning time: 0.182 ms
 Execution time: 0.101 ms
(10 rows)

# explain analyze select e.* from event e join result r on e.id = r.event_id where r.team_2_score=24;
                                                  QUERY PLAN                                                   
---------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=27.09..38.22 rows=7 width=899) (actual time=0.027..0.029 rows=1 loops=1)
   Hash Cond: (e.id = r.event_id)
   ->  Seq Scan on event e  (cost=0.00..10.80 rows=80 width=899) (actual time=0.010..0.011 rows=2 loops=1)
   ->  Hash  (cost=27.00..27.00 rows=7 width=8) (actual time=0.010..0.010 rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Seq Scan on result r  (cost=0.00..27.00 rows=7 width=8) (actual time=0.006..0.007 rows=1 loops=1)
               Filter: (team_2_score = 24)
               Rows Removed by Filter: 1
 Planning time: 0.140 ms
 Execution time: 0.058 ms
(10 rows)

They both have the same min and max cost as well as the same query plan. Also, notice that even in the top query the team_score_2 gets applied as a 'Filter'.

它们都具有相同的最小和最大成本以及相同的查询计划。另外,请注意,即使在顶部查询中,team_score_2 也被用作“过滤器”。

回答by eKek0

I guess that the first, because it makes a more specific filter over the data. But you should see the execution plan, as with any optimization, because it can be very different deppending on size of data, server hardware, etc.

我猜是第一个,因为它对数据进行了更具体的过滤。但是您应该看到执行计划,就像任何优化一样,因为它可能会因数据大小、服务器硬件等而有很大不同。

回答by David M

Is it faster? Try it and see.

它更快吗?试试看。

Which is easier to read? The first to me looks more "correct", as the moved condition is nothing really to do with the join.

哪个更容易阅读?第一个对我来说看起来更“正确”,因为移动的条件与连接无关。

回答by Joseph Mastey

It is really unlikely that the placement of this join will be the deciding factor for performance. I am not intimately familiar with the execution planning for tsql, but it's likely that they will be optimized automatically to similar plans.

这个连接的位置不太可能成为性能的决定因素。我对 tsql 的执行计划不是很熟悉,但它们很可能会自动优化为类似的计划。

回答by 3Dave

Rule #0: Run some benchmarks and see! The only way to really tell which will be faster is to try it. These types of benchmarks are very easy to perform using the SQL profiler.

规则 #0:运行一些基准测试并查看!真正判断哪个更快的唯一方法是尝试它。使用 SQL 分析器可以很容易地执行这些类型的基准测试。

Also, examine the execution plan for the query written with a JOIN and with a WHERE clause to see what differences stand out.

此外,检查使用 JOIN 和 WHERE 子句编写的查询的执行计划,看看有哪些不同之处。

Finally, as others have said, these two should be treated identically by any decent optimizer, including the one built into SQL Server.

最后,正如其他人所说,任何体面的优化器都应该对这两个优化器一视同仁,包括 SQL Server 中内置的优化器。