提高两个大表的 MySQL JOIN 速度

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2011852/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 14:56:22  来源:igfitidea点击:

increase speed for MySQL JOIN for two large tables

mysql

提问by Brian

I have to JOIN to large tables in a MySQL query and it takes really long - approximately 180 seconds. Are there any tips for optimizing a merge?

我必须在 MySQL 查询中加入大表,这需要很长时间 - 大约 180 秒。是否有任何优化合并的提示?

My table has 10 fields. I am only using 4 in the query - all strings. Table has about 600,000 rows and the result should have about 50 rows.

我的表有 10 个字段。我只在查询中使用 4 - 所有字符串。表有大约 600,000 行,结果应该有大约 50 行。

The four used rows are: Title, Variables, Location, Date

使用的四行是:标题、变量、位置、日期

Here is my query:

这是我的查询:

SELECT DISTINCT t1.Title, t1.Variables FROM `MyTABLE` t1 JOIN `MyTABLE` t2  
USING (Title,  Variables) 
WHERE (t1.Location, t1.Date) = ('Location1', 'Date1') 
AND (t2.Location, t2.Date) = ('Location2', 'Date2')

回答by Roland Bouman

Like others pointed out, you need proper indexes. For this particular query, you can benefit from indexes like:

就像其他人指出的那样,您需要适当的索引。对于此特定查询,您可以从以下索引中受益:

(Location, Date) or (Date, Location) (for the WHEREclause) and (Title, Variables) or (Variables, Title) (for the joincondition, ONclause)

( Location, Date) 或 ( Date, Location)(对于WHERE子句)和(Title, Variables)或(Variables, Title)(对于join条件,ON子句)

It would be helpful to know exactly the size (that is, datatype) of the location, Date, Title, and Variables columns, as a large index is likely to be slower than a small one.

准确了解位置、日期、标题和变量列的大小(即数据类型)会很有帮助,因为大索引可能比小索引慢。

Finally, just a tip: I would not use fancy comparison constructs like you do. The

最后,只是一个提示:我不会像你那样使用花哨的比较结构。这

USING (Title,  Variables) 

is probably ok, but I would certainly check if

可能没问题,但我肯定会检查是否

(t1.Location, t1.Date) = ('Location1', 'Date1') 

and

(t2.Location, t2.Forecast_date) = ('Location2', 'Date2')

are behaving like you expect. SO I would definitely run EXPLAINon it, and compare the output with a "regular" old fashioned comparison, like so:

表现得如您所料。所以我肯定会运行EXPLAIN它,并将输出与“常规”老式比较进行比较,如下所示:

    t1.Location      = 'Location1'
AND t1.Date          = 'Date1'
AND t2.Location      = 'Location2'
AND t2.Forecast_date = 'Date2'

You may argue that logically, it is the same and it shouldn't matter - you'd be right. But then again, MySQL's optimizer isn't very smart, and there is always a possibility of bugs, especially with features that aren't used a lot. I think this is such a feature. So i would at least try to EXPLAIN and see if these alternate notations are evaluated the same.

您可能会从逻辑上争辩说,它是相同的,应该无关紧要 - 您是对的。但话又说回来,MySQL 的优化器不是很聪明,而且总是存在错误的可能性,尤其是那些不常使用的功能。我认为这是一个这样的功能。所以我至少会尝试解释一下,看看这些替代符号的评估是否相同。

But what BenoKrapo pointed out, would it not be easier to do something like this:

但是 BenoKrapo 指出,做这样的事情会不会更容易:

SELECT Title, Variables 
FROM   MyTABLE
WHERE  Location = 'Location1' AND Date = 'Date1' 
OR     Location = 'Location2' AND Date = 'Date2'
GROUP BY Title, Variables
HAVING COUNT(*) >= 2

EDIT: I changed HAVING COUNT(*) = 2to HAVING COUNT(*) >= 2. See comments (thanks again, BenoKrapo)

编辑:我改HAVING COUNT(*) = 2HAVING COUNT(*) >= 2。查看评论(再次感谢,BenoKrapo)

EDIT: days after posting this answer, I found this post from Mark Callaghan, MySQL Architect for Facebook: http://www.facebook.com/note.php?note_id=243134480932Essentially, he describes how similar-but-different 'smart' comparisons deliver abysmal performance due to MySQL optimizer bug. So my point is, try to unfancy your syntax when you suffer, you might have hit a bug.

编辑:发布这个答案几天后,我发现了 Facebook MySQL 架构师 Mark Callaghan 的这篇文章:http: //www.facebook.com/note.php?note_id =243134480932 本质上,他描述了相似但不同的“智能” ' 由于 MySQL 优化器错误,比较提供了糟糕的性能。所以我的观点是,当你受苦时试着不花哨你的语法,你可能会遇到一个错误。

回答by Mitch Wheat

Yes. Create appropriate indexes based upon the queries being run against the tables involved.

是的。根据对所涉及的表运行的查询创建适当的索引。

回答by mjsabby

Can you prepend your SQL statement with "EXPLAIN" and then re-run it, it's likely because of missing indices on the columns you're joining on.

你能不能在你的 SQL 语句前面加上“EXPLAIN”然后重新运行它,这可能是因为你加入的列上缺少索引。

Also trying using STRAIGHT_JOIN and mention the table that is slower in size in the left, and the bigger one on the right to hint MySQL to choose the first table.

还尝试使用 STRAIGHT_JOIN 并在左侧提及大小较慢的表,并在右侧提及较大的表以提示 MySQL 选择第一个表。

回答by Benoit Vidis

Make sure the fields you are matching on are indexed. Matching numeric values is also a quicker than strings.

确保您匹配的字段已编入索引。匹配数值也比字符串更快。

But wouldn't it be simpler to just write

但是直接写不是更简单吗

SELECT DISTINCT 
  Title, 
  Variables 
FROM `MyTABLE`
WHERE 
  Location = 'Location1' AND Date = 'Date1' 
  OR
  Location = 'Location2' AND Date = 'Date2'

回答by Brian

This might be cheating a little bit, but I actually found it easier to JOIN the two queries together in PHP after the query. This only works because I am selecting two distinct variables.

这可能有点作弊,但我实际上发现在查询之后在 PHP 中将两个查询连接在一起更容易。这只有效,因为我选择了两个不同的变量。

$query = "SELECT DISTINCT Title, Variables FROM 
MyTABLE WHERE Location='Location1' AND Variable='Variable1'";

$result = mysql_result($query);
while ($row = mysql_array_assoc($result)) {
    $Title = $row['Title'];
    $Variables = $row['Variables'];
    $Array_result1[$Title] = $Variables;
}


$query = "SELECT DISTINCT Title, Variables FROM 
MyTABLE WHERE Location='Location2' AND Variable='Variable2'";

$result = mysql_result($query);
while ($row = mysql_array_assoc($result)) {
    $Title = $row['Title'];
    $Variables = $row['Variables'];
    $Array_result2[$Title] = $Variables;
}

$Array_result = array_intersect($Array_result1, $Array_result2);

I Liked the idea of only using one MySQL query to merge the two queries, but this is so much faster.

我喜欢只使用一个 MySQL 查询来合并两个查询的想法,但这要快得多。

回答by Mohit Satish Pawar

I did two separate joins and combined result using the union operator. I was getting good improvements in time. SELECT t1.Title, t1.Variables FROMMyTABLEt1 JOINMyTABLEt2 on (t1.Location, t1.Date) = ('Location1', 'Date1') UNION SELECT t1.Title, t1.Variables FROMMyTABLEt1 JOINMyTABLEt2 on (t2.Location, t2.Date) = ('Location2', 'Date2');

我使用联合运算符做了两个单独的连接和组合结果。我及时得到了很好的改进。 SELECT t1.Title, t1.Variables FROMMyTABLE t1 JOINMyTABLE t2 on (t1.Location, t1.Date) = ('Location1', 'Date1') UNION SELECT t1.Title, t1.Variables FROMMyTABLE t1 JOINMyTABLEt2 on (t2.Location, t2.Date) = ('Location2', 'Date2');

Make sure that both queries have the same number of column and same data type for each column. Also, check the order of select clause.

确保两个查询的每列具有相同的列数和相同的数据类型。另外,检查 select 子句的顺序。

回答by Tor Valamo

Without the description of the tables and the query there's little we can do to help.

没有表和查询的描述,我们无能为力。

There are several things that can determine the speed of a join.

有几个因素可以决定连接的速度。

  • The database engine: Are you using InnoDB or MyISAM? Or maybe any other engine? Some are faster at lookups than others, which affects joins.
  • Indexes: Are the appropriate match-columns indexed?
  • Partition indexes: Maybe you can partition the table by indexes to make it even faster?
  • 数据库引擎:你使用的是 InnoDB 还是 MyISAM?或者其他任何引擎?有些查找速度比其他的快,这会影响连接。
  • 索引:是否对适当的匹配列进行了索引?
  • 分区索引:也许您可以通过索引对表进行分区以使其更快?

Also, look at EXPLAIN querywhich will look at all the steps that mysql takes to execute it. It could help you tremendously.

另外,看看which 将查看 mysql 执行它所采取的所有步骤。它可以极大地帮助你。EXPLAIN query

回答by Nitin Midha

Try to use composite index on columns in where clause and try to put all other columns in select in Included Columns, this will save traditional look up cost.

尝试在 where 子句中的列上使用复合索引,并尝试将所有其他列放在 Included Columns 中的 select 中,这将节省传统的查找成本。