SQL INSERT 但避免重复

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1688618/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 04:18:45  来源:igfitidea点击:

SQL INSERT but avoid duplicates

sqlsql-serversql-server-2005

提问by Ravi

I want to do some quick inserts but avoid duplicates into a Table. For argument's sake lets call it MarketPrices, I've been experimenting with two ways of doing it but not sure how to benchmark which will be faster.

我想做一些快速插入但避免重复到表中。为了论证,我们称之为 MarketPrices,我一直在尝试两种方法,但不确定如何进行基准测试,哪种方法会更快。

INSERT INTO MarketPrices (SecurityCode, BuyPrice, SellPrice, IsMarketOpen)
SELECT @SecurityCode, @BuyPrice,  @SellPrice, @IsMarketOpen
EXCEPT
SELECT SecurityCode, BuyPrice, SellPrice, j.bool as IsActive FROM MarketPrices
CROSS JOIN (SELECT 0 as bool UNION SELECT 1 as bool ) as j

OR

或者

DECLARE @MktId int
SET @MktId = (SELECT SecurityId FROM MarketPrices 
              where SecurityCode = @SecurityCode 
              and BuyPrice=@BuyPrice 
              and SellPrice = @SellPrice)

IF (@MktId is NULL)  
BEGIN
    INSERT INTO MarketPrices (SecurityCode, BuyPrice, SellPrice, IsMarketOpen)
    VALUES
    (@SecurityCode,@BuyPrice, @SellPrice, @IsMarketOpen)
END

Assume that @whateveris an input parameter in the stored procedure.

假设这@whatever是存储过程中的输入参数。

I want to be able to insert a new record for every SecurityCode when the BuyPrice or SellPrice or both are different from every other previous occurance. I don't care about IsMarketOpen.

我希望能够在 BuyPrice 或 SellPrice 或两者与之前发生的所有其他事件不同时为每个 SecurityCode 插入新记录。我不关心 IsMarketOpen。

Is there anything glaringly stupid about either of the above approaches? Is one faster than the other?

上述任何一种方法有什么明显的愚蠢之处吗?一个比另一个快吗?

采纳答案by Peter Radocchia

EDIT: to prevent race conditionsin concurrent environments, use WITH (UPDLOCK)in the correlated subquery or EXCEPT'd SELECT. The test script I wrote below doesn't require it, since it uses temporary tables that are only visible to the current connection, but in a real environment, operating against user tables, it would be necessary.

编辑:为了防止并发环境中的竞争条件,请WITH (UPDLOCK)在相关子查询或EXCEPT'd 中使用SELECT。我下面写的测试脚本不需要它,因为它使用了仅对当前连接可见的临时表,但在实际环境中,对用户表进行操作时,它是必要的。

MERGEdoesn't require UPDLOCK.

MERGE不需要UPDLOCK.



Inspired by mcl's answer re: unique index & let the database throw an error, I decided to benchmark conditional insertsvs. try/catch.

受到 mcl 的回答 re: unique index & let the database throw an error 的启发,我决定对条件插入try/catch进行基准测试。

The results appear to support the conditional insert over try/catch, but YMMV. It's a very simple scenario (one column, small table, etc), executed on one machine, etc.

结果似乎支持 try/catch 上的条件插入,但 YMMV。这是一个非常简单的场景(一列、小表等),在一台机器上执行,等等。

Here are the results (SQL Server 2008, build 10.0.1600.2):

以下是结果(SQL Server 2008,版本 10.0.1600.2):

duplicates (short table)    
  try/catch:                14440 milliseconds / 100000 inserts
  conditional insert:        2983 milliseconds / 100000 inserts
  except:                    2966 milliseconds / 100000 inserts
  merge:                     2983 milliseconds / 100000 inserts

uniques
  try/catch:                 3920 milliseconds / 100000 inserts
  conditional insert:        3860 milliseconds / 100000 inserts
  except:                    3873 milliseconds / 100000 inserts
  merge:                     3890 milliseconds / 100000 inserts

  straight insert:           3173 milliseconds / 100000 inserts

duplicates (tall table)
  try/catch:                14436 milliseconds / 100000 inserts
  conditional insert:        3063 milliseconds / 100000 inserts
  except:                    3063 milliseconds / 100000 inserts
  merge:                     3030 milliseconds / 100000 inserts

Notice, even on unique inserts, there's slightlymore overhead to try/catch than a conditional insert. I wonder if this varies by version, CPU, number of cores, etc.

请注意,即使在独特的插入上,try/catch 的开销也略高于条件插入。我想知道这是否因版本、CPU、内核数量等而异。

I did not benchmark the IFconditional inserts, just WHERE. I assume the IFvariety would show more overhead, since a) would you have two statements, and b) you would need to wrap the two statements in a transaction and set the isolation level to serializable (!). If someone wantedto test this, you would need to change the temp table to a regular user table (serializable doesn't apply to local temp tables).

我没有对IF条件插入进行基准测试,只是WHERE. 我认为该IF品种会显示更多的开销,因为 a) 您是否有两个语句,并且 b) 您需要将这两个语句包装在一个事务中并将隔离级别设置为可序列化 (!)。如果有人对此进行测试,您需要将临时表更改为常规用户表(可序列化不适用于本地临时表)。

Here is the script:

这是脚本:

-- tested on SQL 2008.
-- to run on SQL 2005, comment out the statements using MERGE
set nocount on

if object_id('tempdb..#temp') is not null drop table #temp
create table #temp (col1 int primary key)
go

-------------------------------------------------------

-- duplicate insert test against a table w/ 1 record

-------------------------------------------------------

insert #temp values (1)
go

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  begin try 
    insert #temp select @x
  end try
  begin catch end catch
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (short table), try/catch: %i milliseconds / %i inserts',-1,-1,@duration,@y) with nowait
go

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  insert #temp select @x where not exists (select * from #temp where col1 = @x)
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (short table), conditional insert: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait
go

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  insert #temp select @x except select col1 from #temp
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (short table), except: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait
go

-- comment this batch out for SQL 2005
declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  merge #temp t using (select @x) s (col1) on t.col1 = s.col1 when not matched by target then insert values (col1);
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (short table), merge: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait
go

-------------------------------------------------------

-- unique insert test against an initially empty table

-------------------------------------------------------

truncate table #temp
declare @x int, @now datetime, @duration int
select @x = 0, @now = getdate()
while @x < 100000 begin
  set @x = @x+1
  insert #temp select @x
end
set @duration = datediff(ms,@now,getdate())
raiserror('uniques, straight insert: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait
go

truncate table #temp
declare @x int, @now datetime, @duration int
select @x = 0, @now = getdate()
while @x < 100000 begin
  set @x = @x+1
  begin try 
    insert #temp select @x
  end try
  begin catch end catch
end
set @duration = datediff(ms,@now,getdate())
raiserror('uniques, try/catch: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait
go

truncate table #temp
declare @x int, @now datetime, @duration int
select @x = 0, @now = getdate()
while @x < 100000 begin
  set @x = @x+1
  insert #temp select @x where not exists (select * from #temp where col1 = @x)
end
set @duration = datediff(ms,@now,getdate())
raiserror('uniques, conditional insert: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait
go

truncate table #temp
declare @x int, @now datetime, @duration int
select @x = 0, @now = getdate()
while @x < 100000 begin
  set @x = @x+1
  insert #temp select @x except select col1 from #temp
end
set @duration = datediff(ms,@now,getdate())
raiserror('uniques, except: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait
go

-- comment this batch out for SQL 2005
truncate table #temp
declare @x int, @now datetime, @duration int
select @x = 1, @now = getdate()
while @x < 100000 begin
  set @x = @x+1
  merge #temp t using (select @x) s (col1) on t.col1 = s.col1 when not matched by target then insert values (col1);
end
set @duration = datediff(ms,@now,getdate())
raiserror('uniques, merge: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait
go

-------------------------------------------------------

-- duplicate insert test against a table w/ 100000 records

-------------------------------------------------------

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  begin try 
    insert #temp select @x
  end try
  begin catch end catch
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (tall table), try/catch: %i milliseconds / %i inserts',-1,-1,@duration,@y) with nowait
go

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  insert #temp select @x where not exists (select * from #temp where col1 = @x)
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (tall table), conditional insert: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait
go

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  insert #temp select @x except select col1 from #temp
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (tall table), except: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait
go

-- comment this batch out for SQL 2005
declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  merge #temp t using (select @x) s (col1) on t.col1 = s.col1 when not matched by target then insert values (col1);
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (tall table), merge: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait
go

回答by Peter Radocchia

EDIT: to prevent race conditionsin a concurrent environment, use WITH (UPDLOCK)in the correlated subquery.

编辑:为了防止并发环境中的竞争条件,请WITH (UPDLOCK)在相关子查询中使用。



I think this would be the standard method:

我认为这将是标准方法:

INSERT INTO MarketPrices (SecurityCode, BuyPrice, SellPrice, IsMarketOpen)
SELECT @SecurityCode, @BuyPrice,  @SellPrice, @IsMarketOpen
WHERE NOT EXISTS (
  SELECT * FROM MarketPrices WITH (UPDLOCK)
  WHERE SecurityCode = @SecurityCode 
    AND BuyPrice = @BuyPrice 
    AND SellPrice = @SellPrice
  )

If any of your fields are nullable, you would have to add that to the condition.

如果您的任何字段可为空,则必须将其添加到条件中。

Your first method is interesting, but the requirements for EXCEPT have you jumping through hoops. This method is essentially the same, but it gets you around the column matching issue.

您的第一种方法很有趣,但是 EXCEPT 的要求让您一头雾水。这种方法本质上是相同的,但它可以帮助您解决列匹配问题。

Alternatively:

或者:

INSERT INTO MarketPrices (SecurityCode, BuyPrice, SellPrice, IsMarketOpen)
SELECT SecurityCode, BuyPrice, SellPrice, @IsMarketOpen
FROM (
  SELECT @SecurityCode, @BuyPrice,  @SellPrice
  EXCEPT
  SELECT SecurityCode, BuyPrice, SellPrice FROM MarketPrices WITH (UPDLOCK)
  ) a (SecurityCode, BuyPrice, SellPrice)

The nice thing about EXCEPT in this instance is that it handles NULLs without any extra coding on your part. To achieve the same thing in first example, you would need to test each pair for NULLs as well as equality, long-hand.

在这种情况下 EXCEPT 的好处是它可以处理 NULL,而无需您进行任何额外的编码。要在第一个示例中实现相同的目的,您需要测试每一对是否为 NULL 以及是否相等,长手。

Your second method is ok, but you don't need the variable. See Tomalak's solution, he cleaned it up nicely. Also, you would need to explicitly handle the possibility of concurrent inserts, if that were a concern.

你的第二种方法没问题,但你不需要变量。看到 Tomalak 的解决方案,他很好地清理了它。此外,您需要明确处理并发插入的可能性,如果这是一个问题。

回答by Tomalak

I would go for a semantic solution anytime. Your two proposals seem quite obscure to me (though the latter is better than the former).

我会随时寻求语义解决方案。你的两个建议对我来说似乎很模糊(尽管后者比前者好)。

IF NOT EXISTS (
  SELECT 1
  FROM   MarketPrices 
  WHERE  SecurityCode  = @SecurityCode 
         AND BuyPrice  = @BuyPrice 
         AND SellPrice = @SellPrice
)  
BEGIN
  INSERT MarketPrices 
    (SecurityCode,   BuyPrice,  SellPrice,  IsMarketOpen)
  VALUES 
    (@SecurityCode, @BuyPrice, @SellPrice, @IsMarketOpen)
END

With a conglomerate index over SecurityCode, BuyPrice, SellPricethe EXISTSquery should go reasonably fast.

拥有超过一集团索引SecurityCode, BuyPrice, SellPriceEXISTS查询应该相当快。

Benchmarking it is a matter of timing a WHILEloop, I would say. Test it and see for yourself.

WHILE我会说,对它进行基准测试是一个循环计时的问题。测试一下,自己看看。

回答by mlibby

Another option: create a unique index on the fields (SecurityCode, BuyPrice, SellPrice) in question, issue a simple insert, and let the database decide whether the records are duplicates. The insert will fail on an attempt to insert a duplicate.

另一种选择:在有问题的字段(SecurityCode、BuyPrice、SellPrice)上创建一个唯一索引,发出一个简单的插入,让数据库决定记录是否重复。尝试插入重复项时插入将失败。

Using code (whether external language or SQL proc) to guarantee uniqueness is not strict enough and will ultimately lead to the very duplicates you hope to prevent.

使用代码(无论是外部语言还是 SQL proc)来保证唯一性不够严格,最终会导致您希望防止的重复。

回答by Jared Moore

Below I have added the top answers from Only inserting a row if it's not already thereto Peter Radocchia's excellent answer.

下面我将仅插入一行中的最佳答案添加到 Peter Radocchia 的优秀答案中。

The takeaway is that using the race safe with try/catchtechnique is marginally (~1%) faster than race safe with updlock, holdlocktechnique when there are no actual collisions (i.e. you expect that collisions will be very rare - this is the uniquesscenario), and is a little slower (~20%) when there are always collisions (this is the duplicatesscenario). This is not taking complex issues like lock escalation into account.

要点是,当没有实际碰撞时,使用该race safe with try/catch技术比race safe with updlock, holdlock技术稍微快(~1%)(即您预计碰撞将非常罕见 - 这就是uniques场景),并且速度稍慢(~20%)当总是有冲突时(这是duplicates场景)。这并没有考虑到锁升级等复杂问题。

Here are the results (SQL Server 2014, build 12.0.2000.8):

以下是结果(SQL Server 2014,版本 12.0.2000.8):

duplicates (short table)    
  try/catch:                       15546 milliseconds / 100000 inserts
  conditional insert:               1460 milliseconds / 100000 inserts
  except:                           1490 milliseconds / 100000 inserts
  merge:                            1420 milliseconds / 100000 inserts
  race safe with try/catch:         1650 milliseconds / 100000 inserts
  race safe with updlock, holdlock: 1330 milliseconds / 100000 inserts

uniques
  try/catch:                        2266 milliseconds / 100000 inserts
  conditional insert:               2156 milliseconds / 100000 inserts
  except:                           2273 milliseconds / 100000 inserts
  merge:                            2136 milliseconds / 100000 inserts
  race safe with try/catch:         2400 milliseconds / 100000 inserts
  race safe with updlock, holdlock: 2430 milliseconds / 100000 inserts

  straight insert:                  1686 milliseconds / 100000 inserts

duplicates (tall table)
  try/catch:                       15826 milliseconds / 100000 inserts
  conditional insert:               1530 milliseconds / 100000 inserts
  except:                           1506 milliseconds / 100000 inserts
  merge:                            1443 milliseconds / 100000 inserts
  race safe with try/catch:         1636 milliseconds / 100000 inserts
  race safe with updlock, holdlock: 1426 milliseconds / 100000 inserts

Duplicates (short table) section:

重复(短表)部分:

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  begin try 
    insert #temp select @x where not exists (select * from #temp where col1 = @x)
  end try
  begin catch 
    if error_number() <> 2627
      throw
  end catch
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (short table), race safe with try/catch: %i milliseconds / %i inserts',-1,-1,@duration,@y) with nowait
go

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  insert #temp select @x where not exists (select * from #temp with (updlock, holdlock) where col1 = @x)
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (short table), race safe with updlock, holdlock: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait
go

Uniques section

独特部分

truncate table #temp
declare @x int, @now datetime, @duration int
select @x = 0, @now = getdate()
while @x < 100000 begin
  set @x = @x+1
  begin try 
    insert #temp select @x where not exists (select * from #temp where col1 = @x)
  end try
  begin catch 
    if error_number() <> 2627
      throw
  end catch
end
set @duration = datediff(ms,@now,getdate())
raiserror('uniques, race safe with try/catch: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait
go

truncate table #temp
declare @x int, @now datetime, @duration int
select @x = 0, @now = getdate()
while @x < 100000 begin
  set @x = @x+1
  insert #temp select @x where not exists (select * from #temp with (updlock, holdlock) where col1 = @x)
end
set @duration = datediff(ms,@now,getdate())
raiserror('uniques, race safe with updlock, holdlock: %i milliseconds / %i inserts',-1,-1,@duration, @x) with nowait
go

Duplicates (tall table) section

重复(高表)部分

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  begin try 
    insert #temp select @x where not exists (select * from #temp where col1 = @x)
  end try
  begin catch 
    if error_number() <> 2627
      throw
  end catch
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (tall table), race safe with try/catch: %i milliseconds / %i inserts',-1,-1,@duration,@y) with nowait
go

declare @x int, @y int, @now datetime, @duration int
select @x = 1, @y = 0, @now = getdate()
while @y < 100000 begin
  set @y = @y+1
  insert #temp select @x where not exists (select * from #temp with (updlock, holdlock) where col1 = @x)
end
set @duration = datediff(ms,@now,getdate())
raiserror('duplicates (tall table), race safe with updlock, holdlock: %i milliseconds / %i inserts',-1,-1,@duration, @y) with nowait
go

回答by IamIC

if you don't need to trap duplicates, you can always create a unique index with "ignore duplicates" set to true. SQL Server will take care of this for you.

如果您不需要捕获重复项,您始终可以创建一个唯一索引,并将“忽略重复项”设置为 true。SQL Server 会为您处理这个问题。