SQL IF EXISTS 在 INSERT、UPDATE、DELETE 之前进行优化

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2273815/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 05:24:37  来源:igfitidea点击:

IF EXISTS before INSERT, UPDATE, DELETE for optimization

sqlsql-servertsqloptimization

提问by Ed Gomoliako

There is quite often situation when you need to execute INSERT, UPDATE or DELETE statement based on some condition. And my question is whether the affect on the performance of the query add IF EXISTS before the command.

当您需要根据某些条件执行 INSERT、UPDATE 或 DELETE 语句时,经常会出现这种情况。我的问题是是否在命令之前添加 IF EXISTS 对查询性能的影响。

Example

例子

IF EXISTS(SELECT 1 FROM Contacs WHERE [Type] = 1)
    UPDATE Contacs SET [Deleted] = 1 WHERE [Type] = 1

What about INSERTs or DELETEs?

INSERT 或 DELETE 呢?

回答by Aaronaught

I'm not completely sure, but I get the impression that this question is really about upsert, which is the following atomic operation:

我不完全确定,但我的印象是这个问题实际上是关于 upsert,这是以下原子操作:

  • If the row exists in both the source and target, UPDATEthe target;
  • If the row only exists in the source, INSERTthe row into the target;
  • (Optionally) If the row exists in the target but notthe source, DELETEthe row from the target.
  • 如果源和目标中都存在该行,则为UPDATE目标;
  • 如果该行只存在于源中,则INSERT将该行放入目标中;
  • (可选)如果目标中存在该行但源中存在,则为目标中DELETE的行。

Developers-turned-DBAs often na?vely write it row-by-row, like this:

开发人员转为 DBA 经常天真地逐行编写它,如下所示:

-- For each row in source
IF EXISTS(<target_expression>)
    IF @delete_flag = 1
        DELETE <target_expression>
    ELSE
        UPDATE target
        SET <target_columns> = <source_values>
        WHERE <target_expression>
ELSE
    INSERT target (<target_columns>)
    VALUES (<source_values>)

This is just about the worst thing you can do, for several reasons:

这几乎是你能做的最糟糕的事情,原因如下:

  • It has a race condition. The row can disappear between IF EXISTSand the subsequent DELETEor UPDATE.

  • It's wasteful. For every transaction you have an extra operation being performed; maybe it's trivial, but that depends entirely on how well you've indexed.

  • Worst of all - it's following an iterative model, thinking about these problems at the level of a single row. This will have the largest (worst) impact of all on overall performance.

  • 它有一个竞争条件。该行可以在IF EXISTS和随后的DELETE或之间消失UPDATE

  • 很浪费。对于每笔交易,您都会执行一个额外的操作;也许这很简单,但这完全取决于您编制索引的程度。

  • 最糟糕的是 - 它遵循迭代模型,在单行级别考虑这些问题。这将对整体性能产生最大(最坏)的影响。

One very minor (and I emphasize minor) optimization is to just attempt the UPDATEanyway; if the row doesn't exist, @@ROWCOUNTwill be 0 and you can then "safely" insert:

一个非常小的(我强调次要的)优化是尝试UPDATE无论如何;如果该行不存在,则为@@ROWCOUNT0,然后您可以“安全地”插入:

-- For each row in source
BEGIN TRAN

UPDATE target
SET <target_columns> = <source_values>
WHERE <target_expression>

IF (@@ROWCOUNT = 0)
    INSERT target (<target_columns>)
    VALUES (<source_values>)

COMMIT

Worst-case, this will still perform two operations for every transaction, but at least there's a chanceof only performing one, and it also eliminates the race condition (kind of).

最坏的情况,这仍然会为每个事务执行两个操作,但至少有可能只执行一个,并且还消除了竞争条件(某种)。

But the real issue is that this is still being done for each row in the source.

但真正的问题是,仍在为源中的每一行执行此操作。

Before SQL Server 2008, you had to use an awkward 3-stage model to deal with this at the set level (still better than row-by-row):

在 SQL Server 2008 之前,你不得不使用一个笨拙的 3-stage 模型在集合级别处理这个(仍然比逐行好):

BEGIN TRAN

INSERT target (<target_columns>)
SELECT <source_columns> FROM source s
WHERE s.id NOT IN (SELECT id FROM target)

UPDATE t SET <target_columns> = <source_columns>
FROM target t
INNER JOIN source s ON t.d = s.id

DELETE t
FROM target t
WHERE t.id NOT IN (SELECT id FROM source)

COMMIT

As I said, performance was pretty lousy on this, but still a lot better than the one-row-at-a-time approach. SQL Server 2008, however, finally introduced MERGEsyntax, so now all you have to do is this:

正如我所说,在这方面的表现非常糟糕,但仍然比一次一行的方法好得多。然而,SQL Server 2008 终于引入了MERGE语法,所以现在你要做的就是:

MERGE target
USING source ON target.id = source.id
WHEN MATCHED THEN UPDATE <target_columns> = <source_columns>
WHEN NOT MATCHED THEN INSERT (<target_columns>) VALUES (<source_columns>)
WHEN NOT MATCHED BY SOURCE THEN DELETE;

That's it. One statement. If you're using SQL Server 2008 and need to perform any sequence of INSERT, UPDATEand DELETEdepending on whether or not the row already exists - even if it's just one row- there is noexcuse not to be using MERGE.

就是这样。一种说法。如果您使用的是 SQL Server 2008 并且需要执行 的任何序列INSERTUPDATE并且DELETE取决于该行是否已经存在 -即使它只是一行-没有理由不使用MERGE.

You can even OUTPUTthe rows affected by a MERGEinto a table variable if you need to find out afterward what was done. Simple, fast, and risk-free. Do it.

如果您需要了解之后做了什么,您甚至可以OUTPUT将受 a 影响的行MERGE放入表变量中。简单、快速且无风险。做吧。

回答by burnall

That is not useful for just one update/delete/insert.
Possibly adds performance if several operators after if condition.
In last case better write

这对仅一次更新/删除/插入没有用。
如果在 if 条件之后有多个运算符,则可能会增加性能。
在最后一种情况下最好写

update a set .. where ..
if @@rowcount > 0 
begin
    ..
end

回答by van

You should not do it for UPDATEand DELETE, as if there is impacton performance, it is not a positiveone.

你不应该为了UPDATEand这样做DELETE,好像对性能有影响一样,这不是一个积极的影响

For INSERTthere might be situations where your INSERTwill raise an exception (UNIQUE CONSTRAINTviolation etc), in which case you might want to prevent it with the IF EXISTSand handle it more gracefully.

因为INSERT可能存在您INSERT将引发异常(UNIQUE CONSTRAINT违规等)的情况,在这种情况下,您可能希望使用 来阻止它IF EXISTS并更优雅地处理它。

回答by A-K

Neither

两者都不

UPDATE … IF (@@ROWCOUNT = 0) INSERT

nor

也不

IF EXISTS(...) UPDATE ELSE INSERT

patterns work as expected under high concurrency. Both may fail. Both may fail very frequently. MERGE is the king - it holds up much better.Let us do some stress testing and see for ourselves.

模式在高并发下按预期工作。两者都可能失败。两者都可能经常失败。MERGE 是王道——它的表现要好得多。让我们做一些压力测试,看看自己。

Here is the table we shall be using:

这是我们将使用的表格:

CREATE TABLE dbo.TwoINTs
    (
      ID INT NOT NULL PRIMARY KEY,
      i1 INT NOT NULL ,
      i2 INT NOT NULL ,
      version ROWVERSION
    ) ;
GO

INSERT  INTO dbo.TwoINTs
        ( ID, i1, i2 )
VALUES  ( 1, 0, 0 ) ;    

IF EXISTS(…) THEN pattern frequently fails under high concurrency.

IF EXISTS(...) THEN 模式在高并发下经常失败。

Let us insert or update rows in a loop using the following simple logic: if a row with given ID exists, update it, and otherwise insert a new one. The following loop implements this logic. Cut and paste it into two tabs, switch into text mode in both tabs, and run them simultaneously.

让我们使用以下简单逻辑在循环中插入或更新行:如果存在具有给定 ID 的行,则更新它,否则插入一个新行。下面的循环实现了这个逻辑。将其剪切并粘贴到两个选项卡中,在两个选项卡中切换到文本模式,然后同时运行它们。

-- hit Ctrl+T to execute in text mode

SET NOCOUNT ON ;

DECLARE @ID INT ;

SET @ID = 0 ;
WHILE @ID > -100000
    BEGIN ;
        SET @ID = ( SELECT  MIN(ID)
                    FROM    dbo.TwoINTs
                  ) - 1 ;
        BEGIN TRY ;

            BEGIN TRANSACTION ;
            IF EXISTS ( SELECT  *
                        FROM    dbo.TwoINTs
                        WHERE   ID = @ID )
                BEGIN ;
                    UPDATE  dbo.TwoINTs
                    SET     i1 = 1
                    WHERE   ID = @ID ;
                END ;
            ELSE
                BEGIN ;
                    INSERT  INTO dbo.TwoINTs
                            ( ID, i1, i2 )
                    VALUES  ( @ID, 0, 0 ) ;
                END ;
            COMMIT ; 
        END TRY
        BEGIN CATCH ;
            ROLLBACK ; 
            SELECT  error_message() ;
        END CATCH ;
    END ; 

When we run this script simultaneously in two tabs, we shall immediately get a huge amount of primary key violations in both tabs. This demonstrates how unreliable the IF EXISTS pattern is when it executes under high concurrency.

当我们在两个选项卡中同时运行此脚本时,我们将立即在两个选项卡中发现大量主键违规。这表明 IF EXISTS 模式在高并发下执行时是多么不可靠。

Note: this example also demonstrates that it is not safe to use SELECT MAX(ID)+1 or SELECT MIN(ID)-1 as the next available unique value if we do it under concurrency.

注意:这个例子还表明,如果我们在并发下使用 SELECT MAX(ID)+1 或 SELECT MIN(ID)-1 作为下一个可用的唯一值是不安全的。

回答by DVK

IF EXISTSwill basically do a SELECT - the same one that UPDATE would.

IF EXISTS基本上会做一个 SELECT - 和 UPDATE 一样。

As such, it will decrease performance- if there's nothing to update, you did the same amount of work (UPDATE would have queried same lack of rows as your select) and if there's something to update, you juet did an un-needed select.

因此,它会降低性能- 如果没有任何内容要更新,您做了相同数量的工作(UPDATE 会查询与您的选择相同的行缺失),如果有内容要更新,您 juet 做了一个不需要的选择。

回答by JoshBerke

You shouldn't do this in most cases. Depending on your transaction level you have created a race condition, now in your example here it wouldn't matter to much, but the data can be changed from the first select to the update. And all you've done is forced SQL to do more work

在大多数情况下,您不应该这样做。根据您的事务级别,您已经创建了一个竞争条件,现在在您的示例中,这无关紧要,但是可以将数据从第一次选择更改为更新。而你所做的只是迫​​使 SQL 做更多的工作

The best way to know for sure is to test the two differences and see which one gives you the appropriate performance.

确定知道的最好方法是测试这两种差异,看看哪一种可以为您提供合适的性能。

回答by Nick Craver

There is a slight effect, since you're doing the same check twice, at least in your example:

有轻微的影响,因为您进行了两次相同的检查,至少在您的示例中是这样:

IF EXISTS(SELECT 1 FROM Contacs WHERE [Type] = 1)

Has to query, see if there are any, if true then:

必须查询,看看有没有,如果有则:

UPDATE Contacs SET [Deleted] = 1 WHERE [Type] = 1

Has to query, see which ones...same check twice for no reason. Now if the condition you're looking for is indexed it ought to be quick, but for large tables you could see some delay just because you're running the select.

必须查询,看看哪些...同样无故检查两次。现在,如果您要查找的条件已编入索引,它应该很快,但是对于大型表,您可能会看到一些延迟,因为您正在运行选择。

回答by Mitch Wheat

The performance of an IF EXISTSstatement:

IF EXISTS语句的表现:

IF EXISTS(SELECT 1 FROM mytable WHERE someColumn = someValue)

depends on the indexes present to satisfy the query.

取决于存在的索引以满足查询。

回答by Philip Kelley

This largely repeats the preceding (by time) five (no, six) (no, seven) answers, but:

这在很大程度上重复了前面的(按时间)五个(不,六)(不,七)答案,但是:

Yes, the IF EXISTS structure that you have by and large will double the work done by the database. While IF EXISTS will "stop" when it finds the first matching row (it doesn't need to find them all), it's still extra and ultimately pointless effort--for updates and deletes.

是的,您拥有的 IF EXISTS 结构基本上将使数据库完成的工作翻倍。虽然 IF EXISTS 在找到第一个匹配行时会“停止”(它不需要全部找到),但它仍然是额外的,最终毫无意义的努力——用于更新和删除。

  • If no such row(s) exist, IF EXISTS will a full scan (table or index) to determine this.
  • If one or more such rows exist, IF EXISTS will read enough of the table/index to find the first one, and then UPDATE or DELETE will then re-read that the table to find it again and process it -- and it will read "the rest of" the table to see if there are any more to process as well. (Fast enough if properly indexed, but still.)
  • 如果不存在这样的行,则 IF EXISTS 将进行完整扫描(表或索引)以确定这一点。
  • 如果存在一个或多个这样的行,IF EXISTS 将读取足够的表/索引以找到第一个,然后 UPDATE 或 DELETE 将重新读取该表以再次查找并处理它——它会读取表的“其余部分”以查看是否还有更多要处理的内容。(如果索引正确,速度足够快,但仍然如此。)

So either way, you'll end up reading the entire table or index at least once. But, why bother with the IF EXISTS in the first place?

因此,无论哪种方式,您最终都会至少读取整个表或索引一次。但是,为什么首先要为 IF EXISTS 烦恼呢?

UPDATE Contacs SET [Deleted] = 1 WHERE [Type] = 1 

or the similar DELETE will work fine whether or not there are any rows found to process. No rows, table scanned, nothing modified, you're done; 1+ rows, table scanned, everything that ought to be is modified, done again. One pass, no fuss, no muss, no having to worry about "did the database get changed by another user between my first query and my second query".

或类似的 DELETE 将正常工作,无论是否找到任何要处理的行。没有行,扫描表,没有修改,你就完成了;1+ 行,表扫描,所有应该修改的内容,再做一次。一次通过,没有大惊小怪,没有大惊小怪,不必担心“在我的第一次查询和第二次查询之间数据库是否被另一个用户更改过”。

INSERT is the situation where it might be useful -- check if the row is present before adding it, to avoid Primary or Unique Key violations. Of course you have to worry about concurrency -- what if someone else is trying to add this row at the same time as you? Wrapping this all into a single INSERT would handle it all in an implicit transaction (remember your ACID properties!):

INSERT 是它可能有用的情况——在添加行之前检查该行是否存在,以避免主键或唯一键冲突。当然,您必须担心并发性——如果其他人试图与您同时添加这一行怎么办?将这一切包装成一个 INSERT 将在一个隐式事务中处理它(记住你的 ACID 属性!):

INSERT Contacs (col1, col2, etc) values (val1, val2, etc) where not exists (select 1 from Contacs where col1 = val1)
IF @@rowcount = 0 then <didn't insert, process accordingly>

回答by bleeeah

Yes this will affect performance (the degree to which performance will be affected will be affected by a number of factors). Effectively you are doing the same query "twice" (in your example). Ask yourself whether or not you need to be this defensive in your query and in what situations would the row not be there? Also, with an update statement the rows affected is probably a better way to determine if anything has been updated.

是的,这会影响性能(性能受到影响的程度将受到许多因素的影响)。实际上,您正在“两次”执行相同的查询(在您的示例中)。问问自己是否需要在查询中保持这种防御性,在什么情况下不会出现该行?此外,使用更新语句,受影响的行可能是确定是否有任何更新的更好方法。