php MySQL 查询 IN() 子句在索引列上变慢

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4771035/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 14:18:45  来源:igfitidea点击:

MySQL Query IN() Clause Slow on Indexed Column

phpmysqlperformance

提问by zmbush

I Have a MySQL query that is being generated by a PHP script, the query will look something like this:

我有一个由 PHP 脚本生成的 MySQL 查询,查询将如下所示:

SELECT * FROM Recipe_Data WHERE 404_Without_200 = 0 AND Failures_Without_Success = 0 AND RHD_No IN (10, 24, 34, 41, 43, 51, 57, 59, 61, 67, 84, 90, 272, 324, 402, 405, 414, 498, 500, 501, 510, 559, 562, 595, 632, 634, 640, 643, 647, 651, 703, 714, 719, 762, 765, 776, 796, 812, 814, 815, 822, 848, 853, 855, 858, 866, 891, 920, 947, 956, 962, 968, 1049, 1054, 1064, 1065, 1070, 1100, 1113, 1119, 1130, 1262, 1287, 1292, 1313, 1320, 1327, 1332, 1333, 1335, 1340, 1343, 1344, 1346, 1349, 1352, 1358, 1362, 1365, 1482, 1495, 1532, 1533, 1537, 1549, 1550, 1569, 1571, 1573, 1574, 1596, 1628, 1691, 1714, 1720, 1735, 1755, 1759, 1829, 1837, 1844, 1881, 1919, 2005, 2022, 2034, 2035, 2039, 2054, 2076, 2079, 2087, 2088, 2089, 2090, 2091, 2092, 2154, 2155, 2156, 2157, 2160, 2162, 2164, 2166, 2169, 2171, 2174, 2176, 2178, 2179, 2183, 2185, 2186, 2187, 2201, 2234, 2236, 2244, 2245, 2250, 2255, 2260, 2272, 2280, 2281, 2282, 2291, 2329, 2357, 2375, 2444, 2451, 2452, 2453, 2454, 2456, 2457, 2460, 2462, 2464, 2465, 2467, 2468, 2469, 2470, 2473, 2474, 2481, 2485, 2487, 2510, 2516, 2519, 2525, 2540, 2545, 2547, 2553, 2571, 2579, 2580, 2587, 2589, 2597, 2602, 2611, 2629, 2660, 2662, 2700, 2756, 2825, 2833, 2835, 2858, 2958, 2963, 2964, 3009, 3090, 3117, 3118, 3120, 3121, 3122, 3123, 3126, 3127, 3129, 3130, 3133, 3135, 3137, 3138, 3139, 3141, 3142, 3145, 3146, 3147, 3151, 3152, 3155, 3193, 3201, 3204, 3219, 3221, 3222, 3223, 3224, 3225, 3226, 3227, 3228, 3229, 3231, 3232, 3233, 3234, 3235, 3237, 3239, 3246, 3250, 3253, 3259, 3261, 3291, 3315, 3328, 3377, 3381, 3383, 3384, 3385, 3387, 3388, 3389, 3390, 3396, 3436, 3463, 3465, 3467, 3470, 3471, 3484, 3507, 3515, 3554, 3572, 3641, 3672, 3683, 3689, 3690, 3692, 3693, 3694, 3697, 3698, 3705, 3711, 3713, 3715, 3716, 3717, 3719, 3720, 3722, 3726, 3727, 3732, 3737, 3763, 3767, 3770, 3771, 3772, 3773, 3803, 3810, 3812, 3816, 3846, 3847, 3848, 3851, 3874, 3882, 3902, 3903, 3906, 3908, 3916, 3924, 3967, 3987, 4006, 4030, 4043, 4045, 4047, 4058, 4067, 4107, 4108, 4114, 4115, 4131, 4132, 4133, 4137, 4138, 4139, 4140, 4141, 4142, 4146, 4150, 4151, 4152, 4153, 4157, 4158, 4160, 4163, 4166, 4167, 4171, 4179, 4183, 4221, 4225, 4242, 4257, 4435, 4437, 4438, 4443, 4446, 4449, 4450, 4451, 4452, 4454, 4460, 4550, 4557, 4618, 4731, 4775, 4804, 4972, 5025, 5026, 5039, 5042, 5294, 5578, 5580, 5599, 5602, 5649, 5726, 5779, 5783, 5931, 5934, 5936, 5939, 5940, 5941, 5978, 6044, 6056, 6113, 6116, 6118, 6122, 6123, 6125, 6127, 6128, 6129, 6130, 6131, 6135, 6141, 6145, 6147, 6150, 6152, 6153, 6154, 6160, 6166, 6169);

The column RHD_No is the primary key for this database, and there are about 400,000 rows total. The problem is, the query is extremely slow, it's often around 2 seconds, but I've seen it get as long as 10.

RHD_No 列是这个数据库的主键,总共有大约 400,000 行。问题是,查询速度非常慢,通常在 2 秒左右,但我已经看到它长达 10 秒。

When I try to explain the query, everything seems like it should be fine:

当我尝试解释查询时,一切似乎都应该没问题:

+----+-------------+-------------+-------+---------------+---------+---------+------+------+-------------+
| id | select_type | table       | type  | possible_keys | key     | key_len | ref  | rows | Extra       |
+----+-------------+-------------+-------+---------------+---------+---------+------+------+-------------+
|  1 | SIMPLE      | Recipe_Data | range | PRIMARY       | PRIMARY | 4       | NULL |  420 | Using where |
+----+-------------+-------------+-------+---------------+---------+---------+------+------+-------------+

When I profile the query I get:

当我分析查询时,我得到:

mysql> show profile;
+--------------------------------+----------+
| Status                         | Duration |
+--------------------------------+----------+
| starting                       | 0.000015 |
| checking query cache for query | 0.000266 |
| Opening tables                 | 0.000009 |
| System lock                    | 0.000004 |
| Table lock                     | 0.000006 |
| init                           | 0.000115 |
| optimizing                     | 0.000038 |
| statistics                     | 0.000797 |
| preparing                      | 0.000047 |
| executing                      | 0.000002 |
| Sending data                   | 2.675270 |
| end                            | 0.000007 |
| query end                      | 0.000003 |
| freeing items                  | 0.000071 |
| logging slow query             | 0.000002 |
| logging slow query             | 0.000058 |
| cleaning up                    | 0.000005 |
+--------------------------------+----------+

I've been working on this problem for a long time and I haven't been able to find a solution. Is there anything overtly wrong with this query? I don't see how looking at 420 rows should take 2+ seconds.

我已经在这个问题上工作了很长时间,但我一直无法找到解决方案。这个查询有什么明显的错误吗?我不明白查看 420 行需要 2 秒以上的时间。

回答by Peter G.

You are accessing 420 rows by primary key which will probably lead to an index access path. This could access 2 index pages and one data page per key. If these are in cache, the query should run fast. If not, every page access that goes to disk will incur the usual disk latency. If we assume 5ms disk latency and 80% cache hits, we arrive at 420*3*0.2*5ms=1.2 seconds which is on the order of what you're seeing.

您正在通过主键访问 420 行,这可能会导致索引访问路径。这可以为每个键访问 2 个索引页和一个数据页。如果这些在缓存中,查询应该运行得很快。否则,进入磁盘的每个页面访问都会导致通常的磁盘延迟。如果我们假设 5 毫秒的磁盘延迟和 80% 的缓存命中率,我们会得到 420*3*0.2*5ms=1.2 秒,这与您所看到的差不多。

回答by DVK

The problem is that INis basically treated as a bunch of ORs (e.g.

问题是IN基本上被视为一堆ORs(例如

col IN (1,2,3)

is

col = 1 OR col = 2 OR col = 3

This is a LOT slower than a join.

这比连接慢很多。

What you should do is to generate the SQL code which creates the temporary table, populates it with the values in the "IN" clause, and then join with that temp table

您应该做的是生成创建临时表的 SQL 代码,用“IN”子句中的值填充它,然后与该临时表连接

CREATE TEMPORARY TABLE numbers (n INT)

Then in a loop, add

然后在循环中,添加

INSERT numbers  VALUES ($next_number)

Then at the end

然后在最后

SELECT * FROM numbers, Recipe_Data 
WHERE numbers.n = RHD_No

回答by Jonathan

You should transform the IN clauses to INNER JOIN clauses.

您应该将 IN 子句转换为 INNER JOIN 子句。

You can transform a query like this one:

您可以转换这样的查询:

SELECT  foo   
FROM    bar   
WHERE bar.stuff IN  
       (SELECT  stuff FROM asdf)

Into a query like this other one:

进入这样的查询:

SELECT  b.foo 
FROM    ( 
        SELECT  DISTINCT stuff 
        FROM    asdf ) a 
JOIN    bar b 
ON      b.stuff = a.stuff

You will gain a lot of performance.

您将获得很多性能。

As the php generate the query, try some kind of trick like a temporary table for the items inside the IN clause. Always try to avoid the IN clauses if you can, because they are very time consuming.

当 php 生成查询时,尝试某种技巧,例如为 IN 子句中的项目创建临时表。如果可以,请始终尽量避免使用 IN 子句,因为它们非常耗时。

回答by bob-the-destroyer

I'm going to gamble here and suggest that executing the following query just once to create an index suitable for your query should reduce the query time by at least a second...

我打算在这里赌一把,并建议只执行一次以下查询以创建适合您查询的索引应该将查询时间减少至少一秒钟......

CREATE INDEX returnstatus ON Recipe_Data(404_Without_200,Failures_Without_Success)

See: http://dev.mysql.com/doc/refman/5.0/en/create-index.htmlfor creating indexes, and http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.htmlfor how indexes are used in queries.

请参阅:http: //dev.mysql.com/doc/refman/5.0/en/create-index.html创建索引,以及http://dev.mysql.com/doc/refman/5.0/en/mysql- index.html了解如何在查询中使用索引。

Failing that, view all running processes on mysql to see if a currently running query from any source just refuses to die while consuming all the server's time and kill it. See: http://dev.mysql.com/doc/refman/5.0/en/kill.html

如果失败,请查看 mysql 上所有正在运行的进程,以查看来自任何来源的当前正在运行的查询是否只是在消耗服务器的所有时间并杀死它的同时拒绝终止。见:http: //dev.mysql.com/doc/refman/5.0/en/kill.html

Failing that, determine what else each record may have in common to avoid having to reference each one individually by ID number in your INstatement. If necessary, add another table column to track that commonality. Then, add column(s) having that commonality to the above index and filter by that in your WHEREclause instead of using the INstatement. For example, if you want only those ID numbers to print out on page, have a visiblecolumn as type: tinyintwith value 0to exclude, and value 1to include in your search results, then add visiblecolumn to your indexs andWHEREclause to speed up the query. You wouldn't need that INstatement at all.

如果做不到这一点,请确定每条记录可能还有哪些共同点,以避免在您的IN声明中通过 ID 号单独引用每条记录。如有必要,添加另一个表格列以跟踪该共性。然后,将具有该共性的列添加到上述索引中,并在您的WHERE子句中过滤,而不是使用该IN语句。例如,如果你想只有那些ID号到一页上打印出来,有一visible列类型:tinyint带值0排除,和值1在搜索结果中包括,然后添加visible列到你的指数法WHERE条款,以加快查询。你根本不需要那个IN声明。

Perhaps your instatement is dynamically built using a previous query. If that's the case, try pulling all rows with Recipe_Data WHERE 404_Without_200 = 0 AND Failures_Without_Success = 0. Then in your PHP script, simply discard a record in your fetch loop if the RHD_Nodoesn't match an expected value.

也许您的in语句是使用先前的查询动态构建的。如果是这种情况,请尝试使用Recipe_Data WHERE 404_Without_200 = 0 AND Failures_Without_Success = 0. 然后在您的 PHP 脚本中,如果RHD_No与预期值不匹配,只需丢弃 fetch 循环中的记录。

回答by X.C.

For someone like me using SQlAlchemy, using for-loop is also a good option:

对于像我这样使用 SQlAlchemy 的人来说,使用 for 循环也是一个不错的选择:

rows=[]

for id in ids:
  row = cls.query.filter(cls.id==id).first()
  if row:
     rows.append(row)

#return rows