MySQL 不使用带有 WHERE IN 子句的索引?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/586381/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 12:50:54  来源:igfitidea点击:

MySQL not using indexes with WHERE IN clause?

sqlmysqlruby-on-railsdatabaseoptimization

提问by jasonlong

I'm trying to optimize some of the database queries in my Rails app and I have several that have got me stumped. They are all using an INin the WHEREclause and are all doing full table scans even though an appropriate index appears to be in place.

我正在尝试优化我的 Rails 应用程序中的一些数据库查询,我有几个让我难住了。它们都使用INinWHERE子句并且都在进行全表扫描,即使适当的索引似乎已经到位。

For example:

例如:

SELECT `user_metrics`.* FROM `user_metrics` WHERE (`user_metrics`.user_id IN (N,N,N,N,N,N,N,N,N,N,N,N))

performs a full table scan and EXPLAINsays:

执行全表扫描并EXPLAIN说:

select_type: simple
type: all
extra: using where
possible_keys: index_user_metrics_on_user_id  (which is an index on the user_id column)
key: (none)
key_length: (none)
ref: (none)
rows: 208

Are indexes not used when an INstatement is used or do I need to do something differently? The queries here are being generated by Rails so I could revisit how my relationships are defined, but I thought I'd start with potential fixes at the DB level first.

使用IN语句时是否不使用索引,或者我是否需要做一些不同的事情?这里的查询是由 Rails 生成的,所以我可以重新审视我的关系是如何定义的,但我想我首先要从数据库级别的潜在修复开始。

回答by vladr

See How MySQL Uses Indexes.

请参阅MySQL 如何使用索引

Also validate whether MySQL still performs a full table scanafter you add an additional 2000-or-so rows to your user_metricstable. In small tables, access-by-index is actually more expensive (I/O-wise) than a table scan, and MySQL's optimizer might take this into account.

还要验证在向表中添加额外的 2000 行左右后,MySQL 是否仍然执行全表扫描user_metrics。在小表中,按索引访问实际上比表扫描更昂贵(I/O 方式),MySQL 的优化器可能会考虑到这一点。

Contrary to my previous post, it turns out that MySQL is also using a cost-based optimizer, which is very good news - that is, provided you run your ANALYZEat least once when you believe that the volume of data in your database is representativeof future day-to-day usage.

与我之前的帖子相反,事实证明 MySQL 也在使用基于成本的优化器,这是一个非常好的消息 - 也就是说,ANALYZE如果您认为数据库中的数据量具有代表性,则至少运行一次未来的日常使用。

When dealing with cost-based optimizers (Oracle, Postgres, etc.), you need to make sure to periodically run ANALYZEon your various tables as their size increases by more than 10-15%. (Postgres will do this automatically for you, by default, whereas other RDBMSs will leave this responsibility to a DBA, i.e. you.) Through statistical analysis, ANALYZEwill help the optimizer get a better idea of how much I/O (and other associated resources, such as CPU, needed e.g. for sorting) will be involved when choosing between various candidate execution plans. Failure to run ANALYZEmay result in very poor, sometimes disastrous planning decisions (e.g. millisecond-queries taking, sometimes, hours because of bad nested loopson JOINs.)

在处理基于成本的优化器(Oracle、Postgres 等)时,您需要确保定期ANALYZE在各种表上运行,因为它们的大小增加了 10-15% 以上。(默认情况下,Postgres 会自动为你做这件事,而其他 RDBMS 会将这个责任留给 DBA,即你。)通过统计分析,ANALYZE将帮助优化器更好地了解有多少 I/O(和其他相关资源)在各种候选执行计划之间进行选择时,将涉及,例如 CPU,需要例如排序)。运行失败ANALYZE可能会导致非常糟糕的,有时是灾难性的规划决策(例如,毫秒查询有时需要数小时,因为s上的嵌套循环不好JOIN。)

If performance is still unsatisfactory after running ANALYZE, then you will typically be able to work around the issue by using hints, e.g. FORCE INDEX, whereas in other cases you might have stumbled over a MySQL bug (e.g. this older one, which could have bitten you were you to use Rails' nested_set).

如果性能仍然运行后不满意ANALYZE,那么你通常可以通过使用提示,如要解决这个问题FORCE INDEX,而在其他情况下,你可能碰上了一个MySQL的错误(例如,这个旧的一个,你是你可能已经被咬使用 Rails' nested_set)。

Now, since you are in a Rails app, it will be cumbersome (and defeat the purpose of ActiveRecord) to issue your custom queries with hints instead of continuing to use the ActiveRecord-generated ones.

现在,由于您在 Rails 应用程序中,因此ActiveRecord使用提示发出自定义查询而不是继续使用ActiveRecord生成的查询会很麻烦(并且会破坏 的目的)。

I had mentioned that in our Rails application allSELECTqueries dropped below 100ms after switching to Postgres, whereas some of the complex joins generated by ActiveRecordwould occasionally take as much as 15s or more with MySQL 5.1 because of nested loops with inner table scans, even when indices were available. No optimizer is perfect, and you should be aware of the options. Other potential performance issues to be aware of, besides query plan optimization, are locking. This is outside the scope of your problem though.

我曾提到,在我们的 Rails 应用程序中,切换到 Postgres 后,所有SELECT查询都下降到 100 毫秒以下,而ActiveRecord在 MySQL 5.1 中,由于带有内表扫描的嵌套循环,即使索引可用。没有优化器是完美的,您应该了解这些选项。除了查询计划优化之外,其他需要注意的潜在性能问题是锁定。不过,这超出了您的问题范围。

回答by Quassnoi

Try forcing this index:

尝试强制使用此索引:

SELECT `user_metrics`.*
FROM `user_metrics` FORCE INDEX (index_user_metrics_on_user_id)
WHERE (`user_metrics`.user_id IN (N,N,N,N,N,N,N,N,N,N,N,N))

I just checked, it does use an index on exactly same query:

我刚刚检查过,它确实在完全相同的查询上使用了索引:

EXPLAIN EXTENDED
SELECT * FROM tests WHERE (test IN ('test 1', 'test 2', 'test 3', 'test 4', 'test 5', 'test 6', 'test 7', 'test 8', 'test 9'))

1, 'SIMPLE', 'tests', 'range', 'ix_test', 'ix_test', '602', '', 9, 100.00, 'Using where'

回答by mluebke

Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.)

有时 MySQL 不使用索引,即使索引可用。发生这种情况的一种情况是优化器估计使用索引将需要 MySQL 访问表中很大比例的行。(在这种情况下,表扫描可能要快得多,因为它需要更少的搜索。)

What percentage of rows match your IN clause?

与您的 IN 子句匹配的行的百分比是多少?

回答by Liem Le

I know I'm late for the party. But hope I can help someone else with similar problem.

我知道我参加聚会迟到了。但希望我能帮助其他有类似问题的人。

Lately, I'm having the same problem. Then I decide to use self-join-thing to solve my problem. The problem is not MySQL. Problem is us. The return type from subquery is difference from our table. So we must cast the type of subquery to the type of select column. Below is example code:

最近,我遇到了同样的问题。然后我决定使用 self-join-thing 来解决我的问题。问题不在于 MySQL。问题是我们。子查询的返回类型与我们的表不同。所以我们必须将子查询的类型转换为选择列的类型。下面是示例代码:

select `user_metrics`.* 
from `user_metrics` um 
join (select `user_metrics`.`user_id` in (N, N, N, N) ) as temp 
on um.`user_id` = temp.`user_id`

Or my own code:

或者我自己的代码:

Old: (Not use index: ~4s)

旧:(不使用索引:~4s)

SELECT 
    `jxm_character`.*
FROM
    jxm_character
WHERE
    information_date IN (SELECT DISTINCT
            (information_date)
        FROM
            jxm_character
        WHERE
            information_date >= DATE_SUB('2016-12-2', INTERVAL 7 DAY))
        AND `jxm_character`.`ranking_type` = 1
        AND `jxm_character`.`character_id` = 3146089;

New: (Use index: ~0.02s)

新:(使用索引:~0.02s)

SELECT 
    *
FROM
    jxm_character jc
        JOIN
    (SELECT DISTINCT
        (information_date)
    FROM
        jxm_character
    WHERE
        information_date >= DATE_SUB('2016-12-2', INTERVAL 7 DAY)) AS temp 
        ON jc.information_date = STR_TO_DATE(temp.information_date, '%Y-%m-%d')
        AND jc.ranking_type = 1
        AND jc.character_id = 3146089;

jxm_character:

jxm_character:

  • Records: ~3.5M
  • PK: jxm_character(information_date, ranking_type, character_id)
  • 记录:~3.5M
  • PK: jxm_character(information_date, rating_type, character_id)

SHOW VARIABLES LIKE '%version%';

SHOW VARIABLES LIKE '%version%';

'protocol_version', '10'
'version', '5.1.69-log'
'version_comment', 'Source distribution'

Last note: Make sure you understand MySQL index left-most rule.

最后一点:确保您了解 MySQL 索引最左侧规则。

P/s: Sorry for my bad English. I post my code (production, of course) to clear my solution :D.

P/s:对不起,我的英语不好。我发布了我的代码(当然是生产)来清除我的解决方案:D。

回答by Paul Tomblin

Does it get any better if you remove the redundant brackets around the where clause?

如果删除 where 子句周围的多余括号,它会变得更好吗?

Although it could just be that because you've only got 200 or so rows, it decided a table scan would be faster. Try with a table with more records in it.

虽然可能只是因为您只有 200 行左右,但它决定表扫描会更快。尝试使用包含更多记录的表。