如何通过多个连接加速 MySQL 查询

Question

提问by BinarySolo00100

Here is my issue, I am selecting and doing multiple joins to get the correct items...it pulls in a fair amount of rows, above 100,000. This query takes more than 5mins when the date range is set to 1 year.

这是我的问题，我正在选择并执行多个连接以获得正确的项目......它拉入了相当多的行，超过 100,000。当日期范围设置为 1 年时，此查询需要 5 分钟以上。

I don't know if it's possible but I am afraid that the user might extend the date range to like ten years and crash it.

我不知道是否可能，但我担心用户可能会将日期范围扩展到十年并使其崩溃。

Anyone know how I can speed this up? Here is the query.

有谁知道我如何加快速度？这是查询。

SELECT DISTINCT t1.first_name, t1.last_name, t1.email 
FROM table1 AS t1 
INNER JOIN table2 AS t2 ON t1.CU_id = t2.O_cid 
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref 
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id 
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id 
WHERE t1.subscribe =1 
AND t1.Cdate >= $startDate
AND t1.Cdate <= $endDate
AND t5.store =2

I am not the greatest with mysql so any help would be appreciated!

我对 mysql 不是最好的，所以任何帮助将不胜感激！

Thanks in advance!

提前致谢！

UPDATE

更新

Here is the explain you asked for

这是你要求的解释

id  select_type     table   type    possible_keys   key     key_len     ref     rows    Extra
1   SIMPLE  t5  ref     PRIMARY,C_store_type,C_id,C_store_type_2    C_store_type_2  1   const   101     Using temporary
1   SIMPLE  t4  ref     PRIMARY,P_cat   P_cat   5   alphacom.t5.C_id    326     Using where
1   SIMPLE  t3  ref     I_pid,I_oref    I_pid   4   alphacom.t4.P_id    31   
1   SIMPLE  t2  eq_ref  O_ref,O_cid     O_ref   28  alphacom.t3.I_oref  1    
1   SIMPLE  t1  eq_ref  PRIMARY     PRIMARY     4   alphacom.t2.O_cid   1   Using where

Also I added an index to table5 rows and table4 rows because they don't really change, however the other tables get around 500-1000 entries a month... I heard you should add an index to a table that has that many new entries....is this true?

我还为 table5 行和 table4 行添加了一个索引，因为它们并没有真正改变，但是其他表每个月大约有 500-1000 个条目......我听说你应该向一个有这么多新条目的表添加一个索引....这是真的？

Answer 1

采纳答案by Bob Jarvis - Reinstate Monica

I'd try the following:

我会尝试以下方法：

First, ensure there are indexes on the following tables and columns (each set of columns in parentheses should be a separate index):

首先，确保在以下表和列上有索引（括号中的每组列应该是一个单独的索引）：

table1 : (subscribe, CDate)
         (CU_id)
table2 : (O_cid)
         (O_ref)
table3 : (I_oref)
         (I_pid)
table4 : (P_id)
         (P_cat)
table5 : (C_id, store)

Second, ifadding the above indexes didn't improve things as much as you'd like, try rewriting the query as

其次，如果添加上述索引没有像您希望的那样改善事情，请尝试将查询重写为

SELECT DISTINCT t1.first_name, t1.last_name, t1.email FROM
  (SELECT CU_id, t1.first_name, t1.last_name, t1.email
     FROM table1
     WHERE subscribe = 1 AND
           CDate >= $startDate AND
           CDate <= $endDate) AS t1
  INNER JOIN table2 AS t2
    ON t1.CU_id = t2.O_cid   
  INNER JOIN table3 AS t3
    ON t2.O_ref = t3.I_oref   
  INNER JOIN table4 AS t4
    ON t3.I_pid = t4.P_id   
  INNER JOIN (SELECT C_id FROM table5 WHERE store = 2) AS t5
    ON t4.P_cat = t5.C_id

I'm hoping here that the first sub-select would cut down significantly on the number of rows to be considered for joining, hopefully making the subsequent joins do less work. Ditto the reasoning behind the second sub-select on table5.

我在这里希望第一个子选择会显着减少要考虑加入的行数，希望使后续的联接工作更少。同上 table5 上第二个子选择背后的推理。

In any case, mess with it. I mean, ultimately it's just a SELECT - you can't really hurt anything with it. Examine the plans that are generated by each different permutation and try to figure out what's good or bad about each.

无论如何，弄乱它。我的意思是，最终它只是一个 SELECT - 你不能用它来伤害任何东西。检查由每个不同排列生成的计划，并尝试找出每个排列的优缺点。

Share and enjoy.

分享和享受。

Answer 2

回答by JNK

Make sure your date columns and all the columns you are joining on are indexed.

确保您的日期列和您加入的所有列都已编入索引。

Doing an unequivalence operator on your dates means it checks every row, which is inherently slower than an equivalence.

在您的日期上执行不等价运算符意味着它会检查每一行，这本质上比等价要慢。

Also, using DISTINCT adds an extra comparison to the logic that your optimizer is running behind the scenes. Eliminate that if possible.

此外，使用 DISTINCT 会增加对优化器在后台运行的逻辑的额外比较。如果可能，消除它。

Answer 3

回答by Eric Towers

Well, first, make a subquery to decimate table1 down to just the records you actually want to go to all the trouble of joining...

好吧，首先，创建一个子查询以将 table1 抽取到你真正想要去加入的所有麻烦的记录......

SELECT DISTINCT t1.first_name, t1.last_name, t1.email  
FROM (  
SELECT first_name, last_name, email, CU_id FROM table1 WHERE  
table1.subscribe = 1  
AND table1.Cdate >= $startDate  
AND table1.Cdate <= $endDate  
) AS t1  
INNER JOIN table2 AS t2 ON t1.CU_id = t2.O_cid  
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref  
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id  
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id  
WHERE t5.store = 2

Then start looking at modifying the directionality of the joins.

然后开始考虑修改连接的方向性。

Additionally, if t5.store is only very rarely 2, then flip this idea around: construct the t5 subquery, then join it back and back and back.

此外，如果 t5.store 只是非常罕见的 2，则翻转这个想法：构造 t5 子查询，然后将其连接起来，然后再回来。

Answer 4

回答by Eric Towers

At present, your query is returning all matching rows on table2-table5, just to establish whether t5.store = 2. If any of table2-table5 have a significantly higher row count than table1, this may be greatly increasing the number of rows processed - consequently, the following query mayperform significantly better:

目前你的查询返回table2-table5上所有匹配的行，只是为了确定t5.store = 2。如果table2-table5中的任何一个的行数明显高于table1，这可能会大大增加处理的行数- 因此，以下查询的性能可能会明显更好：

SELECT DISTINCT t1.first_name, t1.last_name, t1.email 
FROM table1 AS t1 
WHERE t1.subscribe =1 
AND t1.Cdate >= $startDate
AND t1.Cdate <= $endDate
AND EXISTS
(SELECT NULL FROM table2 AS t2
INNER JOIN table3 AS t3 ON t2.O_ref = t3.I_oref 
INNER JOIN table4 AS t4 ON t3.I_pid = t4.P_id 
INNER JOIN table5 AS t5 ON t4.P_cat = t5.C_id AND t5.store =2
WHERE t1.CU_id = t2.O_cid);

Answer 5

回答by vinothkr

Try adding indexes on the fields that you join. It may or may not improve the performance.

尝试在您加入的字段上添加索引。它可能会也可能不会提高性能。

Moreover it also depends on the engine that you are using. If you are using InnoDB check your configuration params. I had faced a similar problem, as the default configuration of innodb wont scale much as myisam's default configuration.

此外，它还取决于您使用的引擎。如果您使用 InnoDB，请检查您的配置参数。我遇到了类似的问题，因为 innodb 的默认配置不会像 myisam 的默认配置那样扩展。

Answer 6

回答by KennethJ

As everyone says, make sure you have indexes.

正如大家所说，确保你有索引。

You can also check if your server is set up properly so it can contain more of, of maybe the entire, dataset in memory.

您还可以检查您的服务器是否设置正确，以便它可以在内存中包含更多数据集，甚至可能包含整个数据集。

Without an EXPLAIN, there's not much to work by. Also keep in mind that MySQL will look at your JOIN, and iterate through all possible solutions before executing the query, which can take time. Once you have the optimal JOIN order from the EXPLAIN, you could try and force this order in your query, eliminating this step from the optimizer.

没有 EXPLAIN，就没什么可做的。还要记住，MySQL 将查看您的 JOIN，并在执行查询之前迭代所有可能的解决方案，这可能需要时间。从 EXPLAIN 获得最佳 JOIN 顺序后，您可以尝试在查询中强制使用此顺序，从而从优化器中消除此步骤。

Answer 7

回答by Mike Cheel

It sounds like you should think about delivering subsets (paging) or limit the results some other way unless there is a reason that the users need every row possible all at once. Typically 100K rows is more than the average person can digest.

听起来您应该考虑提供子集（分页）或以其他方式限制结果，除非有理由让用户同时需要每一行。通常 100K 行超过了普通人的消化能力。

如何通过多个连接加速 MySQL 查询

提问by BinarySolo00100

采纳答案by Bob Jarvis - Reinstate Monica

回答by JNK

回答by Eric Towers

回答by Eric Towers

回答by vinothkr

回答by KennethJ

回答by Mike Cheel

相关推荐

最近更新

标签

如何通过多个连接加速 MySQL 查询

提问by BinarySolo00100

采纳答案by Bob Jarvis - Reinstate Monica

回答by JNK

回答by Eric Towers

回答by Eric Towers

回答by vinothkr

回答by KennethJ

回答by Mike Cheel

相关推荐

MySQL SQL 在同一字段中搜索多个值

MySQL 如果不为空，则选择列值，否则使用另一个列值

MySQL 多条件选择查询

MySQL 是否可以在没有数据库访问权限的情况下从任何网站获取数据库内容？

相关推荐

最近更新

标签