PostgreSQL - “IN”子句中的最大参数数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1009706/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 22:16:19  来源:igfitidea点击:

PostgreSQL - max number of parameters in "IN" clause?

postgresql

提问by

In Postgres, you can specify an IN clause, like this:

在 Postgres 中,您可以指定一个 IN 子句,如下所示:

SELECT * FROM user WHERE id IN (1000, 1001, 1002)

Does anyone know what's the maximum number of parameters you can pass into IN?

有谁知道您可以传递给 IN 的最大参数数量是多少?

回答by Jordan S. Jones

According to the source code located here, starting at line 850,PostgreSQL doesn't explicitly limit the number of arguments.

根据位于此处的源代码,从第 850 行开始,PostgreSQL 没有明确限制参数的数量。

The following is a code comment from line 870:

以下是第 870 行的代码注释:

/*
 * We try to generate a ScalarArrayOpExpr from IN/NOT IN, but this is only
 * possible if the inputs are all scalars (no RowExprs) and there is a
 * suitable array type available.  If not, we fall back to a boolean
 * condition tree with multiple copies of the lefthand expression.
 * Also, any IN-list items that contain Vars are handled as separate
 * boolean conditions, because that gives the planner more scope for
 * optimization on such clauses.
 *
 * First step: transform all the inputs, and detect whether any are
 * RowExprs or contain Vars.
 */

回答by nimai

This is not really an answer to the present question, however it might help others too.

这并不是当前问题的真正答案,但它也可能对其他人有所帮助。

At least I can tell there is a technical limit of 32767 values (=Short.MAX_VALUE) passable to the PostgreSQL backend, using Posgresql's JDBC driver 9.1.

至少我可以看出,使用 Posgresql 的 JDBC 驱动程序 9.1,有 32767 个值(=Short.MAX_VALUE)可传递给 PostgreSQL 后端的技术限制。

This is a test of "delete from x where id in (... 100k values...)" with the postgresql jdbc driver:

这是使用 postgresql jdbc 驱动程序“从 x where id in (... 100k values...)”中删除的测试:

Caused by: java.io.IOException: Tried to send an out-of-range integer as a 2-byte value: 100000
    at org.postgresql.core.PGStream.SendInteger2(PGStream.java:201)

回答by hacker13ua

explain select * from test where id in (values (1), (2));

QUERY PLAN

查询计划

 Seq Scan on test  (cost=0.00..1.38 rows=2 width=208)
   Filter: (id = ANY ('{1,2}'::bigint[]))

But if try 2nd query:

但是如果尝试第二个查询:

explain select * from test where id = any (values (1), (2));

QUERY PLAN

查询计划

Hash Semi Join  (cost=0.05..1.45 rows=2 width=208)
       Hash Cond: (test.id = "*VALUES*".column1)
       ->  Seq Scan on test  (cost=0.00..1.30 rows=30 width=208)
       ->  Hash  (cost=0.03..0.03 rows=2 width=4)
             ->  Values Scan on "*VALUES*"  (cost=0.00..0.03 rows=2 width=4)

We can see that postgres build temp table and join with it

我们可以看到 postgres 构建了临时表并加入了它

回答by Prasanth Jayachandran

There is no limit to the number of elements that you are passing to IN clause. If there are more elements it will consider it as array and then for each scan in the database it will check if it is contained in the array or not. This approach is not so scalable. Instead of using IN clause try using INNER JOIN with temp table. Refer http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/for more info. Using INNER JOIN scales well as query optimizer can make use of hash join and other optimization. Whereas with IN clause there is no way for the optimizer to optimize the query. I have noticed speedup of at least 2x with this change.

传递给 IN 子句的元素数量没有限制。如果有更多元素,它会将其视为数组,然后对于数据库中的每次扫描,它将检查它是否包含在数组中。这种方法不是那么可扩展。不要使用 IN 子句,而是尝试将 INNER JOIN 与临时表一起使用。有关更多信息,请参阅http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/。使用 INNER JOIN 扩展和查询优化器可以利用散列连接和其他优化。而使用 IN 子句,优化器无法优化查询。我注意到此更改至少提高了 2 倍。

回答by blubb

As someone more experienced with Oracle DB, I was concerned about this limit too. I carried out a performance test for a query with ~10'000 parameters in an IN-list, fetching prime numbers up to 100'000 from a table with the first 100'000 integers by actually listing all the prime numbers as query parameters.

作为对 Oracle DB 更有经验的人,我也担心这个限制。我对IN-list 中包含约 10'000 个参数的查询进行了性能测试,通过实际将所有质数列为查询参数,从具有前 100'000 个整数的表中获取最多 100'000 个质数

My results indicate that you need not worry about overloading the query plan optimizer or getting plans without index usage, since it will transform the query to use = ANY({...}::integer[])where it can leverage indices as expected:

我的结果表明,您不必担心查询计划优化器超载或在没有索引使用的情况下获取计划,因为它将转换查询以使用= ANY({...}::integer[])它可以按预期利用索引的地方:

-- prepare statement, runs instantaneous:
PREPARE hugeplan (integer, integer, integer, ...) AS
SELECT *
FROM primes
WHERE n IN (, , , ..., 92);

-- fetch the prime numbers:
EXECUTE hugeplan(2, 3, 5, ..., 99991);

-- EXPLAIN ANALYZE output for the EXECUTE:
"Index Scan using n_idx on primes  (cost=0.42..9750.77 rows=9592 width=5) (actual time=0.024..15.268 rows=9592 loops=1)"
"  Index Cond: (n = ANY ('{2,3,5,7, (...)"
"Execution time: 16.063 ms"

-- setup, should you care:
CREATE TABLE public.primes
(
  n integer NOT NULL,
  prime boolean,
  CONSTRAINT n_idx PRIMARY KEY (n)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE public.primes
  OWNER TO postgres;

INSERT INTO public.primes
SELECT generate_series(1,100000);

However, this (rather old) thread on the pgsql-hackers mailing listindicates that there is still a non-negligible cost in planning such queries, so take my word with a grain of salt.

然而,pgsql-hackers 邮件列表上的这个(相当旧的)线程表明规划此类查询的成本仍然不可忽视,所以请相信我的话。

回答by hacker13ua

If you have query like:

如果您有以下查询:

SELECT * FROM user WHERE id IN (1, 2, 3, 4 -- and thousands of another keys)

you may increase performace if rewrite your query like:

如果重写您的查询,您可能会提高性能:

SELECT * FROM user WHERE id = ANY(VALUES (1), (2), (3), (4) -- and thousands of another keys)

回答by Andrew

Just tried it. the answer is -> out-of-range integer as a 2-byte value: 32768

刚试过。答案是 -> 超出范围的整数作为 2 字节值:32768

回答by PatrikAkerstrand

You might want to consider refactoring that query instead of adding an arbitrarily long list of ids... You could use a range if the ids indeed follow the pattern in your example:

您可能需要考虑重构该查询,而不是添加任意长的 id 列表...如果 id 确实遵循示例中的模式,则可以使用范围:

SELECT * FROM user WHERE id >= minValue AND id <= maxValue;

Another option is to add an inner select:

另一种选择是添加一个内部选择:

SELECT * 
FROM user 
WHERE id IN (
    SELECT userId
    FROM ForumThreads ft
    WHERE ft.id = X
);