SQL WHERE ID IN (id1, id2, ..., idn)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/5803472/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL WHERE ID IN (id1, id2, ..., idn)
提问by Daniel Pe?alba
I need to write a query to retrieve a big list of ids.
我需要编写一个查询来检索一个大的 id 列表。
We do support many backends (MySQL, Firebird, SQLServer, Oracle, PostgreSQL ...) so I need to write a standard SQL.
我们确实支持许多后端(MySQL、Firebird、SQLServer、Oracle、PostgreSQL ...),所以我需要编写一个标准的 SQL。
The size of the id set could be big, the query would be generated programmatically. So, what is the best approach?
id 集的大小可能很大,查询将以编程方式生成。那么,最好的方法是什么?
1) Writing a query using IN
1) 使用 IN 编写查询
SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)
My question here is. What happens if n is very big? Also, what about performance?
我的问题是。如果 n 很大会发生什么?另外,性能怎么样?
2) Writing a query using OR
2) 使用 OR 编写查询
SELECT * FROM TABLE WHERE ID = id1 OR ID = id2 OR ... OR ID = idn
I think that this approach does not have n limit, but what about performance if n is very big?
我认为这种方法没有 n 限制,但是如果 n 很大,性能如何?
3) Writing a programmatic solution:
3)编写程序化解决方案:
foreach (var id in myIdList)
{
var item = GetItemByQuery("SELECT * FROM TABLE WHERE ID = " + id);
myObjectList.Add(item);
}
We experienced some problems with this approach when the database server is queried over the network. Normally is better to do one query that retrieve all results versus making a lot of small queries. Maybe I'm wrong.
当通过网络查询数据库服务器时,我们遇到了这种方法的一些问题。通常情况下,执行一次检索所有结果的查询比进行大量小查询更好。也许我错了。
What would be a correct solution for this problem?
这个问题的正确解决方案是什么?
采纳答案by ThiefMaster
Option 1 is the only good solution.
选项 1 是唯一好的解决方案。
Why?
为什么?
Option 2 does the same but you repeat the column name lots of times; additionally the SQL engine doesn't immediately know that you want to check if the value is one of the values in a fixed list. However, a good SQL engine could optimize it to have equal performance like with
IN
. There's still the readability issue though...Option 3 is simply horrible performance-wise. It sends a query every loop and hammers the database with small queries. It also prevents it from using any optimizations for "value is one of those in a given list"
选项 2 执行相同的操作,但您多次重复列名;此外,SQL 引擎不会立即知道您要检查该值是否是固定列表中的值之一。但是,一个好的 SQL 引擎可以优化它以具有与
IN
. 仍然存在可读性问题...选项 3 在性能方面简直太糟糕了。它在每个循环中发送一个查询,并用小查询来冲击数据库。它还可以防止它对“值是给定列表中的值之一”使用任何优化
回答by Ed Guiness
An alternative approach might be to use another table to contain id values. This other table can then be inner joined on your TABLE to constrain returned rows. This will have the major advantage that you won't need dynamic SQL (problematic at the best of times), and you won't have an infinitely long IN clause.
另一种方法可能是使用另一个表来包含 id 值。然后可以在您的 TABLE 上对另一个表进行内部连接以限制返回的行。这将有一个主要优点,即您不需要动态 SQL(在最好的情况下有问题),并且您不会有无限长的 IN 子句。
You would truncate this other table, insert your large number of rows, then perhaps create an index to aid the join performance. It would also let you detach the accumulation of these rows from the retrieval of data, perhaps giving you more options to tune performance.
您将截断另一个表,插入大量行,然后可能创建一个索引来帮助连接性能。它还可以让您从数据检索中分离出这些行的累积,或许可以为您提供更多选项来调整性能。
Update: Although you could use a temporary table, I did not mean to imply that you must or even should. A permanent table used for temporary data is a common solution with merits beyond that described here.
更新:虽然您可以使用临时表,但我并不是要暗示您必须或什至应该。用于临时数据的永久表是一种常见的解决方案,其优点超出此处所述。
回答by Ritu
What Ed Guiness suggested is really a performance booster , I had a query like this
Ed Guiness 的建议确实是一个性能助推器,我有一个这样的查询
select * from table where id in (id1,id2.........long list)
what i did :
我做了什么 :
DECLARE @temp table(
ID int
)
insert into @temp
select * from dbo.fnSplitter('#idlist#')
Then inner joined the temp with main table :
然后内部将临时表与主表连接起来:
select * from table inner join temp on temp.id = table.id
And performance improved drastically.
并且性能显着提高。
回答by Adarsh Kumar
First option is definitely the best option.
第一个选项绝对是最好的选择。
SELECT * FROM TABLE WHERE ID IN (id1, id2, ..., idn)
However considering that the list of ids is very huge, say millions, you should consider chunk sizes like below:
但是,考虑到 id 列表非常庞大,比如数百万,您应该考虑如下块大小:
- Divide you list of Ids into chunks of fixed number, say 100
- Chunk size should be decided based upon the memory size of your server
- Suppose you have 10000 Ids, you will have 10000/100 = 100 chunks
- Process one chunk at a time resulting in 100 database calls for select
- 将您的 Id 列表分成固定数量的块,比如 100
- 块大小应根据服务器的内存大小决定
- 假设您有 10000 个 ID,您将有 10000/100 = 100 个块
- 一次处理一个块,导致 100 次数据库调用 select
Why should you divide into chunks?
为什么要分成小块?
You will never get memory overflow exception which is very common in scenarios like yours. You will have optimized number of database calls resulting in better performance.
您永远不会遇到内存溢出异常,这在您的场景中非常常见。您将优化数据库调用次数,从而提高性能。
It has always worked like charm for me. Hope it would work for my fellow developers as well :)
它对我来说一直很有吸引力。希望它也适用于我的开发人员:)
回答by JakeJ
Doing the SELECT * FROM MyTable where id in () command on an Azure SQL table with 500 million records resulted in a wait time of > 7min!
在具有 5 亿条记录的 Azure SQL 表上执行 SELECT * FROM MyTable where id in () 命令导致等待时间超过 7 分钟!
Doing this instead returned results immediately:
这样做反而会立即返回结果:
select b.id, a.* from MyTable a
join (values (250000), (2500001), (2600000)) as b(id)
ON a.id = b.id
Use a join.
使用连接。
回答by judda
Sample 3 would be the worst performer out of them all because you are hitting up the database countless times for no apparent reason.
示例 3 将是所有这些中表现最差的,因为您无缘无故地无数次访问数据库。
Loading the data into a temp table and then joining on that would be by far the fastest. After that the IN should work slightly faster than the group of ORs.
将数据加载到临时表中然后加入该表将是迄今为止最快的。之后,IN 的工作速度应该比 OR 组稍快。
回答by Quassnoi
In most database systems, IN (val1, val2, …)
and a series of OR
are optimized to the same plan.
在大多数数据库系统中,IN (val1, val2, …)
一系列的OR
优化都是为了同一个计划。
The third way would be importing the list of values into a temporary table and join it which is more efficient in most systems, if there are lots of values.
第三种方法是将值列表导入临时表并加入它,如果有很多值,这在大多数系统中更有效。
You may want to read this articles:
您可能想阅读以下文章:
回答by flq
I think you mean SqlServer but on Oracle you have a hard limit how many IN elements you can specify: 1000.
我认为您的意思是 SqlServer,但在 Oracle 上,您有一个硬性限制,您可以指定多少个 IN 元素:1000。