如何通过多个连接加速 MySQL 查询
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3989560/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I speed up MySQL query with multiple joins
提问by BinarySolo00100
Here is my issue, I am selecting and doing multiple joins to get the correct items...it pulls in a fair amount of rows, above 100,000. This query takes more than 5mins when the date range is set to 1 year.
这是我的问题,我正在选择并执行多个连接以获得正确的项目......它拉入了相当多的行,超过 100,000。当日期范围设置为 1 年时,此查询需要 5 分钟以上。
I don't know if it's possible but I am afraid that the user might extend the date range to like ten years and crash it.
我不知道是否可能,但我担心用户可能会将日期范围扩展到十年并使其崩溃。
Anyone know how I can speed this up? Here is the query.
有谁知道我如何加快速度?这是查询。
SELECT DISTINCT t1.first_name, t1.last_name, t1.email
FROM table1 AS t1
INNER JOIN table2 AS t2 ON t1.CU_id = t2.O_cid
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id
WHERE t1.subscribe =1
AND t1.Cdate >= $startDate
AND t1.Cdate <= $endDate
AND t5.store =2
I am not the greatest with mysql so any help would be appreciated!
我对 mysql 不是最好的,所以任何帮助将不胜感激!
Thanks in advance!
提前致谢!
UPDATE
更新
Here is the explain you asked for
这是你要求的解释
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE t5 ref PRIMARY,C_store_type,C_id,C_store_type_2 C_store_type_2 1 const 101 Using temporary
1 SIMPLE t4 ref PRIMARY,P_cat P_cat 5 alphacom.t5.C_id 326 Using where
1 SIMPLE t3 ref I_pid,I_oref I_pid 4 alphacom.t4.P_id 31
1 SIMPLE t2 eq_ref O_ref,O_cid O_ref 28 alphacom.t3.I_oref 1
1 SIMPLE t1 eq_ref PRIMARY PRIMARY 4 alphacom.t2.O_cid 1 Using where
Also I added an index to table5 rows and table4 rows because they don't really change, however the other tables get around 500-1000 entries a month... I heard you should add an index to a table that has that many new entries....is this true?
我还为 table5 行和 table4 行添加了一个索引,因为它们并没有真正改变,但是其他表每个月大约有 500-1000 个条目......我听说你应该向一个有这么多新条目的表添加一个索引....这是真的?
采纳答案by Bob Jarvis - Reinstate Monica
I'd try the following:
我会尝试以下方法:
First, ensure there are indexes on the following tables and columns (each set of columns in parentheses should be a separate index):
首先,确保在以下表和列上有索引(括号中的每组列应该是一个单独的索引):
table1 : (subscribe, CDate)
(CU_id)
table2 : (O_cid)
(O_ref)
table3 : (I_oref)
(I_pid)
table4 : (P_id)
(P_cat)
table5 : (C_id, store)
Second, ifadding the above indexes didn't improve things as much as you'd like, try rewriting the query as
其次,如果添加上述索引没有像您希望的那样改善事情,请尝试将查询重写为
SELECT DISTINCT t1.first_name, t1.last_name, t1.email FROM
(SELECT CU_id, t1.first_name, t1.last_name, t1.email
FROM table1
WHERE subscribe = 1 AND
CDate >= $startDate AND
CDate <= $endDate) AS t1
INNER JOIN table2 AS t2
ON t1.CU_id = t2.O_cid
INNER JOIN table3 AS t3
ON t2.O_ref = t3.I_oref
INNER JOIN table4 AS t4
ON t3.I_pid = t4.P_id
INNER JOIN (SELECT C_id FROM table5 WHERE store = 2) AS t5
ON t4.P_cat = t5.C_id
I'm hoping here that the first sub-select would cut down significantly on the number of rows to be considered for joining, hopefully making the subsequent joins do less work. Ditto the reasoning behind the second sub-select on table5.
我在这里希望第一个子选择会显着减少要考虑加入的行数,希望使后续的联接工作更少。同上 table5 上第二个子选择背后的推理。
In any case, mess with it. I mean, ultimately it's just a SELECT - you can't really hurt anything with it. Examine the plans that are generated by each different permutation and try to figure out what's good or bad about each.
无论如何,弄乱它。我的意思是,最终它只是一个 SELECT - 你不能用它来伤害任何东西。检查由每个不同排列生成的计划,并尝试找出每个排列的优缺点。
Share and enjoy.
分享和享受。
回答by JNK
Make sure your date columns and all the columns you are joining on are indexed.
确保您的日期列和您加入的所有列都已编入索引。
Doing an unequivalence operator on your dates means it checks every row, which is inherently slower than an equivalence.
在您的日期上执行不等价运算符意味着它会检查每一行,这本质上比等价要慢。
Also, using DISTINCT adds an extra comparison to the logic that your optimizer is running behind the scenes. Eliminate that if possible.
此外,使用 DISTINCT 会增加对优化器在后台运行的逻辑的额外比较。如果可能,消除它。
回答by Eric Towers
Well, first, make a subquery to decimate table1 down to just the records you actually want to go to all the trouble of joining...
好吧,首先,创建一个子查询以将 table1 抽取到你真正想要去加入的所有麻烦的记录......
SELECT DISTINCT t1.first_name, t1.last_name, t1.email
FROM (
SELECT first_name, last_name, email, CU_id FROM table1 WHERE
table1.subscribe = 1
AND table1.Cdate >= $startDate
AND table1.Cdate <= $endDate
) AS t1
INNER JOIN table2 AS t2 ON t1.CU_id = t2.O_cid
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id
WHERE t5.store = 2
Then start looking at modifying the directionality of the joins.
然后开始考虑修改连接的方向性。
Additionally, if t5.store is only very rarely 2, then flip this idea around: construct the t5 subquery, then join it back and back and back.
此外,如果 t5.store 只是非常罕见的 2,则翻转这个想法:构造 t5 子查询,然后将其连接起来,然后再回来。
回答by Eric Towers
At present, your query is returning all matching rows on table2-table5, just to establish whether t5.store = 2. If any of table2-table5 have a significantly higher row count than table1, this may be greatly increasing the number of rows processed - consequently, the following query mayperform significantly better:
目前你的查询返回table2-table5上所有匹配的行,只是为了确定t5.store = 2。如果table2-table5中的任何一个的行数明显高于table1,这可能会大大增加处理的行数- 因此,以下查询的性能可能会明显更好:
SELECT DISTINCT t1.first_name, t1.last_name, t1.email
FROM table1 AS t1
WHERE t1.subscribe =1
AND t1.Cdate >= $startDate
AND t1.Cdate <= $endDate
AND EXISTS
(SELECT NULL FROM table2 AS t2
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id AND t5.store =2
WHERE t1.CU_id = t2.O_cid);
回答by vinothkr
Try adding indexes on the fields that you join. It may or may not improve the performance.
尝试在您加入的字段上添加索引。它可能会也可能不会提高性能。
Moreover it also depends on the engine that you are using. If you are using InnoDB check your configuration params. I had faced a similar problem, as the default configuration of innodb wont scale much as myisam's default configuration.
此外,它还取决于您使用的引擎。如果您使用 InnoDB,请检查您的配置参数。我遇到了类似的问题,因为 innodb 的默认配置不会像 myisam 的默认配置那样扩展。
回答by KennethJ
As everyone says, make sure you have indexes.
正如大家所说,确保你有索引。
You can also check if your server is set up properly so it can contain more of, of maybe the entire, dataset in memory.
您还可以检查您的服务器是否设置正确,以便它可以在内存中包含更多数据集,甚至可能包含整个数据集。
Without an EXPLAIN, there's not much to work by. Also keep in mind that MySQL will look at your JOIN, and iterate through all possible solutions before executing the query, which can take time. Once you have the optimal JOIN order from the EXPLAIN, you could try and force this order in your query, eliminating this step from the optimizer.
没有 EXPLAIN,就没什么可做的。还要记住,MySQL 将查看您的 JOIN,并在执行查询之前迭代所有可能的解决方案,这可能需要时间。从 EXPLAIN 获得最佳 JOIN 顺序后,您可以尝试在查询中强制使用此顺序,从而从优化器中消除此步骤。
回答by Mike Cheel
It sounds like you should think about delivering subsets (paging) or limit the results some other way unless there is a reason that the users need every row possible all at once. Typically 100K rows is more than the average person can digest.
听起来您应该考虑提供子集(分页)或以其他方式限制结果,除非有理由让用户同时需要每一行。通常 100K 行超过了普通人的消化能力。