SQL WHERE ID IN (id1, id2, ..., idn)

Question

提问by Daniel Pe?alba

I need to write a query to retrieve a big list of ids.

我需要编写一个查询来检索一个大的 id 列表。

We do support many backends (MySQL, Firebird, SQLServer, Oracle, PostgreSQL ...) so I need to write a standard SQL.

我们确实支持许多后端（MySQL、Firebird、SQLServer、Oracle、PostgreSQL ...），所以我需要编写一个标准的 SQL。

The size of the id set could be big, the query would be generated programmatically. So, what is the best approach?

id 集的大小可能很大，查询将以编程方式生成。那么，最好的方法是什么？

1) Writing a query using IN

1) 使用 IN 编写查询

SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)

My question here is. What happens if n is very big? Also, what about performance?

我的问题是。如果 n 很大会发生什么？另外，性能怎么样？

2) Writing a query using OR

2) 使用 OR 编写查询

SELECT * FROM TABLE WHERE ID = id1 OR ID = id2 OR ... OR ID = idn

I think that this approach does not have n limit, but what about performance if n is very big?

我认为这种方法没有 n 限制，但是如果 n 很大，性能如何？

3) Writing a programmatic solution:

3）编写程序化解决方案：

  foreach (var id in myIdList)
  {
      var item = GetItemByQuery("SELECT * FROM TABLE WHERE ID = " + id);
      myObjectList.Add(item);
  }

We experienced some problems with this approach when the database server is queried over the network. Normally is better to do one query that retrieve all results versus making a lot of small queries. Maybe I'm wrong.

当通过网络查询数据库服务器时，我们遇到了这种方法的一些问题。通常情况下，执行一次检索所有结果的查询比进行大量小查询更好。也许我错了。

What would be a correct solution for this problem?

这个问题的正确解决方案是什么？

Answer 1

采纳答案by ThiefMaster

Option 1 is the only good solution.

选项 1 是唯一好的解决方案。

Why?

为什么？

Option 2 does the same but you repeat the column name lots of times; additionally the SQL engine doesn't immediately know that you want to check if the value is one of the values in a fixed list. However, a good SQL engine could optimize it to have equal performance like with IN. There's still the readability issue though...
Option 3 is simply horrible performance-wise. It sends a query every loop and hammers the database with small queries. It also prevents it from using any optimizations for "value is one of those in a given list"

选项 2 执行相同的操作，但您多次重复列名；此外，SQL 引擎不会立即知道您要检查该值是否是固定列表中的值之一。但是，一个好的 SQL 引擎可以优化它以具有与IN. 仍然存在可读性问题...
选项 3 在性能方面简直太糟糕了。它在每个循环中发送一个查询，并用小查询来冲击数据库。它还可以防止它对“值是给定列表中的值之一”使用任何优化

Answer 2

回答by Ed Guiness

An alternative approach might be to use another table to contain id values. This other table can then be inner joined on your TABLE to constrain returned rows. This will have the major advantage that you won't need dynamic SQL (problematic at the best of times), and you won't have an infinitely long IN clause.

另一种方法可能是使用另一个表来包含 id 值。然后可以在您的 TABLE 上对另一个表进行内部连接以限制返回的行。这将有一个主要优点，即您不需要动态 SQL（在最好的情况下有问题），并且您不会有无限长的 IN 子句。

You would truncate this other table, insert your large number of rows, then perhaps create an index to aid the join performance. It would also let you detach the accumulation of these rows from the retrieval of data, perhaps giving you more options to tune performance.

您将截断另一个表，插入大量行，然后可能创建一个索引来帮助连接性能。它还可以让您从数据检索中分离出这些行的累积，或许可以为您提供更多选项来调整性能。

Update: Although you could use a temporary table, I did not mean to imply that you must or even should. A permanent table used for temporary data is a common solution with merits beyond that described here.

更新：虽然您可以使用临时表，但我并不是要暗示您必须或什至应该。用于临时数据的永久表是一种常见的解决方案，其优点超出此处所述。

Answer 3

回答by Ritu

What Ed Guiness suggested is really a performance booster , I had a query like this

Ed Guiness 的建议确实是一个性能助推器，我有一个这样的查询

select * from table where id in (id1,id2.........long list)

what i did :

我做了什么：

DECLARE @temp table(
            ID  int
            )
insert into @temp 
select * from dbo.fnSplitter('#idlist#')

Then inner joined the temp with main table :

然后内部将临时表与主表连接起来：

select * from table inner join temp on temp.id = table.id

And performance improved drastically.

并且性能显着提高。

Answer 4

回答by Adarsh Kumar

First option is definitely the best option.

第一个选项绝对是最好的选择。

SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)

However considering that the list of ids is very huge, say millions, you should consider chunk sizes like below:

但是，考虑到 id 列表非常庞大，比如数百万，您应该考虑如下块大小：

Divide you list of Ids into chunks of fixed number, say 100
Chunk size should be decided based upon the memory size of your server
Suppose you have 10000 Ids, you will have 10000/100 = 100 chunks
Process one chunk at a time resulting in 100 database calls for select

将您的 Id 列表分成固定数量的块，比如 100
块大小应根据服务器的内存大小决定
假设您有 10000 个 ID，您将有 10000/100 = 100 个块
一次处理一个块，导致 100 次数据库调用 select

Why should you divide into chunks?

为什么要分成小块？

You will never get memory overflow exception which is very common in scenarios like yours. You will have optimized number of database calls resulting in better performance.

您永远不会遇到内存溢出异常，这在您的场景中非常常见。您将优化数据库调用次数，从而提高性能。

It has always worked like charm for me. Hope it would work for my fellow developers as well :)

它对我来说一直很有吸引力。希望它也适用于我的开发人员：)

Answer 5

回答by JakeJ

Doing the SELECT * FROM MyTable where id in () command on an Azure SQL table with 500 million records resulted in a wait time of > 7min!

在具有 5 亿条记录的 Azure SQL 表上执行 SELECT * FROM MyTable where id in () 命令导致等待时间超过 7 分钟！

Doing this instead returned results immediately:

这样做反而会立即返回结果：

select b.id, a.* from MyTable a
join (values (250000), (2500001), (2600000)) as b(id)
ON a.id = b.id

Use a join.

使用连接。

Answer 6

回答by judda

Sample 3 would be the worst performer out of them all because you are hitting up the database countless times for no apparent reason.

示例 3 将是所有这些中表现最差的，因为您无缘无故地无数次访问数据库。

Loading the data into a temp table and then joining on that would be by far the fastest. After that the IN should work slightly faster than the group of ORs.

将数据加载到临时表中然后加入该表将是迄今为止最快的。之后，IN 的工作速度应该比 OR 组稍快。

Answer 7

回答by Quassnoi

In most database systems, IN (val1, val2, …)and a series of ORare optimized to the same plan.

在大多数数据库系统中，IN (val1, val2, …)一系列的OR优化都是为了同一个计划。

The third way would be importing the list of values into a temporary table and join it which is more efficient in most systems, if there are lots of values.

第三种方法是将值列表导入临时表并加入它，如果有很多值，这在大多数系统中更有效。

You may want to read this articles:

您可能想阅读以下文章：

Passing parameters in MySQL: IN list vs. temporary table

在 MySQL 中传递参数：IN 列表 vs. 临时表

Answer 8

回答by flq

I think you mean SqlServer but on Oracle you have a hard limit how many IN elements you can specify: 1000.

我认为您的意思是 SqlServer，但在 Oracle 上，您有一个硬性限制，您可以指定多少个 IN 元素：1000。

SQL WHERE ID IN (id1, id2, ..., idn)

提问by Daniel Pe?alba

1) Writing a query using IN

1) 使用 IN 编写查询

2) Writing a query using OR

2) 使用 OR 编写查询

3) Writing a programmatic solution:

3）编写程序化解决方案：

采纳答案by ThiefMaster

Why?

为什么？

回答by Ed Guiness

回答by Ritu

回答by Adarsh Kumar

回答by JakeJ

回答by judda

回答by Quassnoi

回答by flq

相关推荐

最近更新

标签

SQL WHERE ID IN (id1, id2, ..., idn)

提问by Daniel Pe?alba

1) Writing a query using IN

1) 使用 IN 编写查询

2) Writing a query using OR

2) 使用 OR 编写查询

3) Writing a programmatic solution:

3）编写程序化解决方案：

采纳答案by ThiefMaster

Why?

为什么？

回答by Ed Guiness

回答by Ritu

回答by Adarsh Kumar

回答by JakeJ

回答by judda

回答by Quassnoi

回答by flq

相关推荐

SQL 我如何编译sql？

SQL：如何选择最早的行

SQL oracle中的子选择

SQL CROSS JOIN 是没有 ON 子句的 INNER JOIN 的同义词吗？

相关推荐

最近更新

标签