MySQL“IN”查询子查询非常慢,但显式值很快

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/5018284/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 18:46:55  来源:igfitidea点击:

MySQL "IN" queries terribly slow with subquery but fast with explicit values

mysqlquery-optimization

提问by Franco

I have a MySQL query (Ubu 10.04,Innodb, Core i7, 16Gb RAM, SSD drives, MySQL params optimized):

我有一个 MySQL 查询(Ubu 10.04、Innodb、Core i7、16Gb RAM、SSD 驱动器、优化的 MySQL 参数):

SELECT
COUNT(DISTINCT subscriberid)
FROM
em_link_data
WHERE
linkid in (SELECT l.id FROM em_link l WHERE l.campaignid = '2900' AND l.link != 'open')

The table em_link_data has about 7million rows, em_link has a few thousand. This query will take about 18 secondsto complete. However, if I substitute the results of the subquery and do this:

表 em_link_data 大约有 700 万行,em_link 有几千行。完成此查询大约需要18 秒。但是,如果我替换子查询的结果并执行以下操作:

SELECT
COUNT(DISTINCT subscriberid)
FROM
em_link_data
WHERE
linkid in (24899,24900,24901,24902);

then the query will run in less than 1 millisecond. The subquery alone runs in less than 1ms, the column linkid is indexed.

那么查询将在不到 1 毫秒内运行。子查询单独运行不到1ms,列linkid被索引。

If I rewrite the query as a join, also less than 1ms. Why is a "IN" query so slow with a subquery in it and why so fast with values in it? I can't rewrite the query (bought software) so I was hoping there is some tweak or hint to speedup this query! Any help is appreciated.

如果我将查询重写为连接,也少于 1 毫秒。为什么带有子查询的“IN”查询如此之慢,而其中的值为何如此之快?我无法重写查询(购买软件),所以我希望有一些调整或提示来加速这个查询!任何帮助表示赞赏。

回答by Brian

Subqueries execute every time you evaluate them (in MySQL anyway, not all RDBMSes), i.e. you're basically running 7 million queries! Using a JOIN, if possible, will reduce this to 1. Even if adding indexing improves performance of those, you're still running them.

每次评估子查询时都会执行子查询(无论如何,在 MySQL 中,并非所有 RDBMS),即您基本上运行了 700 万个查询!如果可能,使用 JOIN 会将其减少到 1。即使添加索引提高了它们的性能,您仍然在运行它们。

回答by awm

Yes, INwith subqueries is slow. Use a join instead.

是的,IN子查询很慢。改用连接。

SELECT
COUNT(DISTINCT subscriberid)
FROM em_link_data JOIN em_link ON em_link_data.linkid=em_link.id
WHERE em_link.campaignid = '2900' AND em_link.link != 'open'

And make sure you've defined indexes on em_link_data.linkidand em_link.id.

并确保您已在em_link_data.linkid和上定义索引em_link.id

回答by Alessandro

The problem is that MySQL executes queries from outside to inside, while you might think that your subquery is done once and then its results are passed to the WHERE expression of the outer query (see MySQL documentation).

问题是 MySQL 从外到内执行查询,而您可能认为您的子查询执行一次,然后将其结果传递给外部查询的 WHERE 表达式(请参阅MySQL 文档)。

If you can't rewrite your query, you should do the following optimizations:

如果您无法重写查询,则应进行以下优化:

  • add an index on campaignidand linkas FrustratedWithFormsDesigner said
  • check that the subquery uses indexes correctly by doing EXPLAIN SELECT ...
  • enable and tweak query cache, as that should speed up the subquery being called multiple times
  • 添加一个索引campaignidlink正如 FrustratedWithFormsDesigner 所说
  • 通过执行检查子查询是否正确使用索引 EXPLAIN SELECT ...
  • 启用和调整查询缓存,因为这应该会加快子查询被多次调用的速度

One more idea would be to install MySQL proxyand write a little script that intercepts your query and rewrites it to use a join.

另一个想法是安装MySQL 代理并编写一个小脚本来拦截您的查询并将其重写为使用连接。

回答by Alix

If your subquery is fast thus campaignid and link are absolutely indexed. l.id is PK and clustered thus is fast. But as far as I remember(from last time I checked this subject) , mysql describes about its internal optimizations for "in" subqueries to use the index sort of subquery result to improve performance and also uses cache for the left side of "IN" to drag it inside the subquery faster and if indexes are set true it must not have such difference to use inner join or "IN" rather than caching and it may be due to cache problem and massive data. http://dev.mysql.com/doc/internals/en/transformation-scalar-in.html

如果您的子查询速度很快,那么campaignid 和link 绝对被编入索引。l.id 是 PK 并且聚集因此很快。但据我所知(从我上次检查这个主题开始),mysql 描述了它对“in”子查询的内部优化,以使用子查询结果的索引排序来提高性能,并在“IN”的左侧使用缓存为了更快地将其拖入子查询,并且如果索引设置为 true,则使用内部联接或“IN”而不是缓存必须没有这种差异,这可能是由于缓存问题和大量数据。 http://dev.mysql.com/doc/internals/en/transformation-scalar-in.html

I don't know the situation of the software but if you can use INNER JOIN and you have (probably) some additional definitions before the IN clause in the WHERE clause of your outer query make sure to move that clauses to before of your main INNER JOIN via a temporary INNER JOIN behaves similar to an intervenient "where" clause sequentially and reduces the number of cross comparisons in a JOIN like this :

我不知道软件的情况,但是如果您可以使用 INNER JOIN 并且您(可能)在外部查询的 WHERE 子句中的 IN 子句之前有一些附加定义,请确保将该子句移至主 INNER 之前通过临时 INNER JOIN JOIN 的行为类似于顺序插入的“where”子句,并减少了 JOIN 中交叉比较的次数,如下所示:

SELECT ... FROM t
INNER JOIN (SELECT 1) AS tmp ON t.asd=23
INNER JOIN t2 ON ...

Sample comparisons of normal and temp join lookups : 1000 * 1000 > 1000 + (100 * 1000)

正常和临时连接查找的示例比较:1000 * 1000 > 1000 + (100 * 1000)

Also it seems the subquery is filtered by constant vals thus if it was me I was gonna put the clauses in a sub query generating the resultset and reduce the number of comparisons in a JOIN like this :

此外,子查询似乎由常量 vals 过滤,因此如果是我,我会将子句放在生成结果集的子查询中,并减少 JOIN 中的比较次数,如下所示:

SELECT ... FROM t
INNER JOIN (SELECT ... FROM t2 WHERE constant clauses) AS tbl2 ON ...

Anyway, in the "IN" query, comparing any column of the table in the subquery to any column of the table in outer query requires the columns of both side to be precisely indexed (regard to composite indexes) but still it may be a cache problem.

无论如何,在“IN”查询中,将子查询中表的任何列与外部查询中表的任何列进行比较需要对双方的列进行精确索引(关于复合索引),但它仍然可能是缓存问题。

EDITED :Also I was curious to ask : Can the making a composite index on l.campaignid, l.link and l.id make any sense?

编辑:我也很好奇地问:在 l.campaignid、l.link 和 l.id 上制作复合索引是否有意义?