SQL 在集合中项目数量非常大的 WHERE 子句中使用“IN”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/532192/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 01:04:49  来源:igfitidea点击:

Using "IN" in a WHERE clause where the number of items in the set is very large

sqlsql-serverms-access

提问by Karim

I have a situation where I need to do an update on a very large set of rows that I can only identify by their ID (since the target records are selected by the user and have nothing in common other than it's the set of records the user wanted to modify). The same property is being updated on all these records so I would I like to make a single UPDATE call.

我有一种情况,我需要对我只能通过它们的 ID 识别的非常大的行集进行更新(因为目标记录是由用户选择的,除了用户的记录集之外没有任何共同点想修改)。在所有这些记录上更新相同的属性,所以我想进行一次 UPDATE 调用。

Is it bad practice or is there a better way to do this update than using "WHERE IN (1,2,3,4,.....10000)" in the UPDATE statement?

这是不好的做法还是有比在 UPDATE 语句中使用“WHERE IN (1,2,3,4,.....10000)”更好的方法来进行此更新?

Would it make more sense to use individual update statements for each record and stick them into a single transaction? Right now I'm working with SQL Server and Access but,if possible, I'd like to hear more broad best-practice solutions across any kind of relational database.

对每条记录使用单独的更新语句并将它们粘贴到单个事务中是否更有意义?现在我正在使用 SQL Server 和 Access,但如果可能的话,我希望听到更广泛的跨任何类型关系数据库的最佳实践解决方案。

采纳答案by DanSingerman

I would always use

我会一直使用

WHERE id IN (1,2,3,4,.....10000)

unless your in clause was stupidlylarge, which shouldn't really happen from user input.

除非你的条款是愚蠢的大,这实在不应该由用户输入的发生。

edit: For instance, Rails does this a lot behind the scenes

编辑:例如,Rails 在幕后做了很多事情

It would definitely not be better to do separate update statements in a single transaction.

在单个事务中执行单独的更新语句肯定不会更好。

回答by Otávio Décio

Another alternative is to store those numbers in a temp table and use it in a join to do the update. If you are able to execute a single update statement is definitely better than executing one statement per record.

另一种选择是将这些数字存储在临时表中并在连接中使用它来进行更新。如果您能够执行单个更新语句肯定比每个记录执行一个语句要好。

回答by JosephStyons

How do you generate the IN clause?

你如何生成 IN 子句?

If there is there another SELECT statement that generates those values, you could simply plug that into the UPDATE like so:

如果还有另一个 SELECT 语句生成这些值,您可以简单地将其插入到 UPDATE 中,如下所示:

UPDATE TARGET_TABLE T
SET
  SOME_VALUE = 'Whatever'
WHERE T.ID_NUMBER IN(
                    SELECT ID_NUMBER  --this SELECT generates your ID #s.
                    FROM SOURCE_TABLE
                    WHERE SOME_CONDITIONS
                    )

In some RDBMses, you'll get better performance by using the EXISTS syntax, which would look like this:

在某些 RDBM 中,您将通过使用 EXISTS 语法获得更好的性能,如下所示:

UPDATE TARGET_TABLE T
SET
  SOME_VALUE = 'Whatever'
WHERE EXISTS (
             SELECT ID_NUMBER  --this SELECT generates your ID #s.
             FROM SOURCE_TABLE S
             WHERE SOME_CONDITIONS
               AND S.ID_NUMBER =  T.ID_NUMBER
             )

回答by Tooony

Without knowing what a "very large" number of ID's might be, I'd venture a guess. ;-)

在不知道“非常大”的 ID 数量可能是多少的情况下,我敢于猜测。;-)

Since you are using Access as a database, the number of ID's can't be thathigh. Assuming we're talking about less than, say 10,000 numbers and we should know the limitations of the containers to hold the ID's (what language is used for the front end?), I'd stick to one UPDATEstatement; if that is most readable and easiest to perform maintenance on later. Otherwise I'd split them into multiple statements using some clever logic. Something like split the statement into multiple statements with in one, ten, hundred, thousand... ID's per statement.

由于您使用 Access 作为数据库,因此 ID 的数量不能那么高。假设我们谈论的数字少于 10,000 个,并且我们应该知道容纳 ID 的容器的限制(前端使用什么语言?),我会坚持一个UPDATE说法;如果那是最易读的并且以后最容易进行维护。否则我会使用一些巧妙的逻辑将它们分成多个语句。类似于将语句拆分为多个语句,每个语句包含一个、十个、一百个、千个……ID。

Then, I'd leave it to the DB optimiser to execute the statement(s) as efficient as possible. I would probably do an 'explain' on the query / queries to make sure nothing silly is going on though.

然后,我会将它留给数据库优化器以尽可能高效地执行语句。我可能会对查询/查询做一个“解释”,以确保没有任何愚蠢的事情发生。

But in my experience, it is quite often OK to leave this kind of optimisation to the database manager itself. The one thing that takes the most time is usually the actual connection to the database, so if you can execute all queries within the same connection it is normally no problems. Make sure you send off all UPDATEstatements before you start to look into and wait for any result sets coming back though. :-)

但根据我的经验,将这种优化留给数据库管理器本身通常是可以的。花费最多时间的一件事通常是与数据库的实际连接,因此如果您可以在同一连接内执行所有查询,则通常没有问题。UPDATE在开始查看并等待任何结果集返回之前,请确保发送所有语句。:-)

回答by Marc Gravell

I would use a table-variable / temp-table; insert the values into this, and join to it. Then you can use the same set multiple times. This works especially well if you are (for example) passing down a CSV of IDs as varchar. As a SQL Server example:

我会使用表变量/临时表;将值插入其中,并加入其中。然后您可以多次使用相同的集合。如果您(例如)将 ID 的 CSV 作为 varchar 传递,则此方法特别有效。作为 SQL Server 示例:

DECLARE @ids TABLE (id int NOT NULL)

INSERT @ids
SELECT value
FROM dbo.SplitCsv(@arg) // need to define separately

UPDATE t
SET    t. // etc
FROM   [TABLE] t
INNER JOIN @ids #i ON #i.id = t.id

回答by HeDinges

In Oracle there is a limit of values you can put into a IN clause. So you better use a OR , x=1 or x=2 ... those are not limited, as far as I know.

在 Oracle 中,可以放入 IN 子句的值是有限制的。所以你最好使用 OR , x=1 或 x=2 ......据我所知,这些不受限制。

回答by u7867

In general there are several things to consider.

一般来说,有几件事需要考虑。

  1. The statement parsing cache in the DB. Each statement, with a different number of items in the IN clause, has to be parsed separately. You ARE using bound variables instead of literals, right?
  2. Some Databases have a limit on the number of items in the IN clause. For Oracle it's 1000.
  3. When updating you lock records. If you have multiple separate update statements you can have deadlocks. This means you have to be careful about the order in which you issue your updates.
  4. Round-trip latency to the database can be high, even for a very fast statement. This means it's often better to manipulate lots of records at once to save trip-time.
  1. 数据库中的语句解析缓存。每个语句,在 IN 子句中具有不同数量的项目,必须单独解析。您正在使用绑定变量而不是文字,对吗?
  2. 某些数据库对 IN 子句中的项目数有限制。对于 Oracle,它是 1000。
  3. 更新时锁定记录。如果您有多个单独的更新语句,则可能会出现死锁。这意味着您必须注意发布更新的顺序。
  4. 即使对于非常快的语句,到数据库的往返延迟也可能很高。这意味着最好一次处理大量记录以节省行程时间。

We recently changed our system to limit the size of the in-clauses and always use bound variables because this reduced the number of different SQL statements and thus improved performance. Basically we generate our SQL statements and execute multiple statements if the in-clause exceeds a certain size. We don't do this for updates so we haven't had to worry about the locking. You will.

我们最近更改了我们的系统以限制 in-clause 的大小并始终使用绑定变量,因为这减少了不同 SQL 语句的数量,从而提高了性能。基本上我们会生成我们的 SQL 语句并在 in-clause 超过一定大小时执行多个语句。我们不会为更新执行此操作,因此我们不必担心锁定。你会。

Using a temp table may not improve performance because you have to populate the temp table with the IDs. Experimentation and performance tests can tell you the answer here.

使用临时表可能不会提高性能,因为您必须用 ID 填充临时表。实验和性能测试可以在这里告诉你答案。

A single IN clause is very easy to understand and maintain. This is probably what you should worry about first. If you find that the performance of the queries is poor you might want to try a different strategy and see if it helps, but don't optimize prematurely. The IN-clause is semantically correct so leave it alone if it isn't broken.

单个 IN 子句非常易于理解和维护。这可能是您首先应该担心的。如果您发现查询的性能很差,您可能想尝试不同的策略,看看它是否有帮助,但不要过早地优化。IN 子句在语义上是正确的,所以如果它没有被破坏,就不要管它。

回答by jimmyorr

If you were on Oracle, I'd recommend using table functions, similar to Marc Gravell's post.

如果您使用 Oracle,我建议您使用表函数,类似于 Marc Gravell 的帖子。

-- first create a user-defined collection type, a table of numbers
create or replace type tbl_foo as table of number;

declare
  temp_foo tbl_foo;
begin
  -- this could be passed in as a parameter, for simplicity I am hardcoding it
  temp_foo := tbl_foo(7369, 7788);

  -- here I use a table function to treat my temp_foo variable as a table, 
  -- and I join it to the emp table as an alternative to a massive "IN" clause
  select e.*
    from emp e,
         table(temp_foo) foo
   where e.empno = foo.column_value;
end;

回答by David-W-Fenton

I don't know the type of values in your IN list. If they are most of the values from 1 to 10,000, you might be able to process them to get something like:

我不知道您的 IN 列表中的值类型。如果它们是从 1 到 10,000 的大部分值,您可能能够处理它们以获得类似的结果:

WHERE MyID BETWEEN 1 AND 10000 AND MyID NOT IN (3,7,4656,987)

Or, if the NOT IN list would still be long, processing the list and generating a bunch of BETWEEN statements:

或者,如果 NOT IN 列表仍然很长,则处理该列表并生成一堆 BETWEEN 语句:

WHERE MyID BETWEEN 1 AND 343 AND MyID BETWEEN 344 AND 400 ...

And so forth.

等等。

Last of all, you don't have to worry about how Jet will process an IN clause if you use a passthrough query. You can't do that in code, but you could have a saved QueryDef that is defined as a passthrough and alter the WHERE clause in code at runtime to use your IN list. Then it's all passed off to SQL Server, and SQL Server will decide best how to process it.

最后,如果您使用传递查询,您不必担心 Jet 将如何处理 IN 子句。您不能在代码中执行此操作,但您可以将保存的 QueryDef 定义为传递并在运行时更改代码中的 WHERE 子句以使用您的 IN 列表。然后将其全部传递给 SQL Server,SQL Server 将决定如何最好地处理它。

回答by Aswath

There are multiple ways of accommodating a large set of values in a where condition

有多种方法可以在 where 条件中容纳大量值

  1. Using Temp Tables

    Insert the values into a temp table with a single column.

    Create a UNIQUE INDEX on that particular column.

    INNER JOIN the required table with the newly created temp table

  2. Using array-like functionality in SQL Server

    SQL does support an array like functionality

    check thislink for full documentation.

  1. 使用临时表

    将值插入具有单列的临时表中。

    在该特定列上创建一个 UNIQUE INDEX。

    INNER JOIN 所需的表与新创建的临时表

  2. 在 SQL Server 中使用类似数组的功能

    SQL 确实支持类似数组的功能

    检查链接以获取完整文档。

SAMPLE SYNTAX :

示例语法:

Create TABLE #IDs (id int NOT NULL)
DECLARE @x varchar(max) = '' 
DECLARE @xParam XML;
SELECT @xParam = CAST('<i>' + REPLACE(@x, ',', '</i><i>') + '</i>' AS XML)
INSERT into #IDs
SELECT x.i.value('.','NVARCHAR(100)') as key FROM @xParam .nodes('//i') x(i)
CREATE UNIQUE INDEX IX_#IDs ON #IDs (ID ASC) 

Query using

查询使用

SELECT A.Name, A.Age from Table A 

INNER JOIN #IDs id on id.id = A.Key

INNER JOIN #IDs id on id.id = A.Key