MySQL/PostgreSQL 中 IN 关键字的速度

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/957405/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-19 23:43:49  来源:igfitidea点击:

Speed of IN keyword in MySQL/PostgreSQL

mysqlperformancepostgresqllist

提问by Sasha Chedygov

I've heard lots of people saying that the INkeyword in most relational databases is slow. How true is this? An example query would be this, off the top of my head:

我听到很多人说IN大多数关系数据库中的关键字很慢。这有多真实?一个示例查询是这样的,在我的脑海中:

SELECT * FROM someTable WHERE someColumn IN (value1, value2, value3)

I've heard that is much slower than doing this:

我听说这比这样做要慢得多:

SELECT * FROM someTable WHERE
  someColumn = value1 OR
  someColumn = value2 OR
  someColumn = value3

Is this true? Or is the speed difference negligible? If it matters, I'm using PostgreSQL, but I'd also like to know how MySQL fares (and if it's any different). Thanks in advance.

这是真的?或者速度差异可以忽略不计?如果重要的话,我使用的是 PostgreSQL,但我也想知道 MySQL 的价格(以及是否有任何不同)。提前致谢。

回答by Greg Smith

In PostgreSQL, exactly what you'll get here depends on the underlying table, so you should use EXPLAIN ANALYZE on some sample queries against a useful subset of your data to figure out exactly what the optimizer is going to do (make sure the tables you're running against have been ANALYZEd too). IN can be processed a couple of different ways, and that's why you need to look at some samples to figure out which alternative is being used for your data. There is no simple generic answer to your question.

在 PostgreSQL 中,您在此处获得的确切信息取决于基础表,因此您应该在针对有用数据子集的一些示例查询中使用 EXPLAIN ANALYZE,以准确确定优化器将要执行的操作(确保您的表'正在运行,也被分析过)。IN 可以通过几种不同的方式进行处理,这就是为什么您需要查看一些示例以找出用于您的数据的替代方案的原因。您的问题没有简单的通用答案。

As for the specific question you added in your revision, against a trivial data set with no indexes involved here's an example of the two query plans you'll get:

至于您在修订中添加的特定问题,针对不涉及索引的简单数据集,这里是您将获得的两个查询计划的示例:

postgres=# explain analyze select * from x where s in ('123','456');
 Seq Scan on x  (cost=0.00..84994.69 rows=263271 width=181) (actual time=0.015..1819.702 rows=247823 loops=1)
   Filter: (s = ANY ('{123,456}'::bpchar[]))
 Total runtime: 1931.370 ms

postgres=# explain analyze select * from x where s='123' or s='456';
 Seq Scan on x  (cost=0.00..90163.62 rows=263271 width=181) (actual time=0.014..1835.944 rows=247823 loops=1)
   Filter: ((s = '123'::bpchar) OR (s = '456'::bpchar))
 Total runtime: 1949.478 ms

Those two runtimes are essentially identical, because the real processing time is dominated by the sequential scan across the table; running multiple times shows the difference between the two is below the run to run margin of error. As you can see, PostgreSQL transforms the IN case into using its ANY filter, which should always execute faster than a series of ORs. Again, this trivial case is not necessarily representative of what you'll see on a serious query where indexes and the like are involved. Regardless, manually replacing INs with a series of OR statements should never be faster, because the optimizer knows the best thing to do here if it has good data to work with.

这两个运行时间本质上是相同的,因为真正的处理时间是由跨表的顺序扫描决定的;多次运行显示两者之间的差异低于运行误差范围。如您所见,PostgreSQL 将 IN 大小写转换为使用其 ANY 过滤器,该过滤器的执行速度应始终比一系列 OR 快。同样,这个微不足道的案例不一定代表您将在涉及索引等的严肃查询中看到的内容。无论如何,用一系列 OR 语句手动替换 IN 应该永远不会更快,因为优化器知道如果它有好的数据可以处理,那么在这里做最好的事情。

In general, PostgreSQL knows more tricks for how to optimize complicated queries than the MySQL optimizer does, but it also relies heavily on your having given the optimizer enough data to work with. The first links on the "Performance Optimization" section of the PostgreSQL wiki covers the most important things needed to get good results from the optimizer.

一般来说,PostgreSQL 比 MySQL 优化器知道更多关于如何优化复杂查询的技巧,但它也严重依赖于你为优化器提供了足够的数据来处理。PostgreSQL wiki 的“性能优化”部分的第一个链接涵盖了从优化器获得良好结果所需的最重要的事情。

回答by Quassnoi

In MySQL, these are complete synonyms for the optimizer:

在 中MySQL,这些是优化器的完整同义词:

SELECT  *
FROM    someTable
WHERE   someColumn IN (value1, value2, value3)

and

SELECT  *
FROM    someTable
WHERE   someColumn = value1 OR
        someColumn = value2 OR
        someColumn = value3

, provided that value's are literal contants or preset variables.

,前提是value's 是文字常量或预设变量。

According to the documentation:

根据文档

The definition of a range condition for a single-part index is as follows:

  • For both BTREEand HASHindexes, comparison of a key part with a constant value is a range condition when using the =, <=>, IN(), IS NULL, or IS NOT NULLoperators.
  • For all types of indexes, multiple range conditions combined with ORor ANDform a range condition.

“Constant value” in the preceding descriptions means one of the following:

  • A constant from the query string
  • A column of a const or system table from the same join
  • The result of an uncorrelated subquery
  • Any expression composed entirely from subexpressions of the preceding types

单部分索引的范围条件定义如下:

  • 对于BTREEHASH索引,使用、、、 或运算符时=,关键部分与常量值的比较是范围条件。<=>IN()IS NULLIS NOT NULL
  • 对于所有类型的索引,多个范围条件结合ORAND形成范围条件。

前面描述中的“恒定值”是指以下之一:

  • 来自查询字符串的常量
  • 来自同一连接的 const 或系统表的列
  • 不相关子查询的结果
  • 完全由上述类型的子表达式组成的任何表达式

However, this query:

但是,这个查询:

SELECT  *
FROM    table
WHERE   id = 1
        OR id = (SELECT id FROM other_table WHERE unique_condition)

will use the index on id, while this one:

将使用上的索引id,而这个:

SELECT  *
FROM    table
WHERE   id IN (1, (SELECT id FROM other_table WHERE unique_condition))

will use fullscan.

将使用全扫描。

I. e. there is difference when one of the value's is a single-row subquery.

IE。当value's 之一是单行子查询时会有所不同。

I've filed it recently as bug 45145in MySQL(it turned out to be 5.2specific, absent in 5.1and corrected in 6.0)

我最近提交它的bug 45145MySQL(它原来是5.2特定的,不存在于5.1和更正6.0

回答by Brent Baisley

Using IN isn't necessarily slow, it's how you build the IN parameters that will slow things down considerably. Too often people use SELECT ... WHERE x IN (SELECT..., which can be very poorly optimized (i.e. not at all). Do a search on "correlated subquery" to see how bad it can get.

使用 IN 不一定很慢,而是您构建 IN 参数的方式会大大减慢速度。人们经常使用 SELECT ... WHERE x IN (SELECT ...,这可能是非常糟糕的优化(即根本没有)。搜索“相关子查询”以查看它会变得多么糟糕。

Often you don't have to use IN at all and can use a JOIN instead, and take advantage of derived tables.

通常,您根本不必使用 IN,而是可以使用 JOIN,并利用派生表。

SELECT * FROM table1 WHERE x IN (SELECT y FROM table2 WHERE z=3)

Can be rephrased like this

可以改写成这样

SELECT * FROM table1 JOIN (SELECT y FROM table2 WHERE z=3) AS table2 ON table1.x=table2.y

If the IN syntax is slow, the JOIN syntax will often times be much faster. You can use EXPLAIN to see how each query would be optimized differently. This is a simplistic example and your database may show the same query path, but more complicated queries usually show something different.

如果 IN 语法很慢,则 JOIN 语法通常会快得多。您可以使用 EXPLAIN 来查看每个查询将如何进行不同的优化。这是一个简单的示例,您的数据库可能会显示相同的查询路径,但更复杂的查询通常会显示不同的内容。

回答by joedevon

I think you got the answer(s) you wanted above. Just wanted to add one thing.

我想你得到了你想要的答案。只是想补充一件事。

You need to optimize IN and use it the right way. In development, I always set up a debug section at the bottom of the page anytime there is a query and it automatically runs an EXPLAIN EXTENDED on every SELECT and then SHOW WARNINGS in order to see the (likely) way that MySQL's Query Optimizer will rewrite the query internally. Lots to learn from that on how to make sure IN is working for you.

您需要优化 IN 并以正确的方式使用它。在开发中,我总是在有查询的时候在页面底部设置一个调试部分,它会在每个 SELECT 上自动运行 EXPLAIN EXTENDED 然后显示警告,以便查看 MySQL 的查询优化器将重写的(可能)方式内部查询。从中可以学到很多关于如何确保 IN 为您工作的知识。

回答by derobert

IN with a subselect is often slow. IN with a value list shouldn't be any slower than someColumn = value1 OR someColumn = value2 OR someColumn = value3, etc. That is plenty fast, as long as the number of values is sane.

带有子选择的 IN 通常很慢。带有值列表的 IN 不应该比 someColumn = value1 OR someColumn = value2 OR someColumn = value3 等慢。只要值的数量正常,这就足够快了。

IN with a subquery is slow when the optimizer can't figure out a good way to perform the query, and has to use the obvious method of building the full result of the subquery. For example:

当优化器找不到执行查询的好方法时,带有子查询的 IN 很慢,并且必须使用明显的方法来构建子查询的完整结果。例如:

SELECT username
  FROM users
  WHERE userid IN (
    SELECT userid FROM users WHERE user_first_name = 'Bob'
  )

is going to be much slower than

将比

SELECT username FROM users WHERE user_first_name = 'Bob'

unless the optimizer can figure out what you meant.

除非优化器可以弄清楚你的意思。

回答by Greg

It says in the docs that INis very fast in MySQL but I can't find the source at the moment.

它在文档中说IN在 MySQL 中非常快,但我目前找不到源代码。

回答by Matthew Vines

The speed of the IN keyword would really depend on the complexity of your subquery. In the example you provide you just want to see if someColumns value is in a set list of values, and a pretty short one at that. So I would imagine that the performance cost would be very minimal in that case.

IN 关键字的速度实际上取决于子查询的复杂性。在您提供的示例中,您只想查看 someColumns 值是否在一组值列表中,以及一个非常短的值列表。所以我想在这种情况下性能成本会非常小。