何时在 MySQL 中使用 STRAIGHT_JOIN

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/512294/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 12:45:52  来源:igfitidea点击:

When to use STRAIGHT_JOIN with MySQL

mysqljoin

提问by Greg

I just had a fairly complex query I was working with and it was taking 8 seconds to run. EXPLAIN was showing a weird table order and my indexes were not all being used even with the FORCE INDEX hint. I came across the STRAIGHT_JOIN join keyword and started replacing some of my INNER JOIN keywords with it. I noticed considerable speed improvement. Eventually I just replaced all my INNER JOIN keywords with STRAIGHT_JOIN for this query and it now runs in .01 seconds.

我刚刚处理了一个相当复杂的查询,运行需要 8 秒。EXPLAIN 显示了一个奇怪的表顺序,即使使用 FORCE INDEX 提示,我的索引也没有全部使用。我遇到了 STRAIGHT_JOIN join 关键字并开始用它替换我的一些 INNER JOIN 关键字。我注意到速度有了显着的提高。最终我只是用 STRAIGHT_JOIN 替换了我所有的 INNER JOIN 关键字,现在它在 0.01 秒内运行。

My question is when do you use STRAIGHT_JOIN and when do you use INNER JOIN? Is there any reason to not use STRAIGHT_JOIN if you are writing good queries?

我的问题是什么时候使用 STRAIGHT_JOIN,什么时候使用 INNER JOIN?如果您正在编写好的查询,是否有任何理由不使用 STRAIGHT_JOIN?

采纳答案by nathan

I wouldn't recommend using STRAIGHT_JOIN without a good reason. My own experience is that the MySQL query optimizer chooses a poor query plan more often than I'd like, but not often enough that you should just bypass it in general, which is what you would be doing if you always used STRAIGHT_JOIN.

没有充分的理由,我不建议使用 STRAIGHT_JOIN。我自己的经验是,MySQL 查询优化器比我希望的更频繁地选择一个糟糕的查询计划,但通常不够多,以至于您应该通常绕过它,如果您总是使用 STRAIGHT_JOIN,这就是您会做的事情。

My recommendation is to leave all queries as regular JOINs. If you discover that one query is using a sub-optimal query plan, I would suggest first trying to rewrite or re-structure the query a bit to see if the optimizer will then pick a better query plan. Also, for innodb at least, make sure it's not just that your index statistics are out-of-date (ANALYZE TABLE). That can cause the optimizer to choose a poor query plan. Optimizer hints should generally be your last resort.

我的建议是将所有查询保留为常规 JOIN。如果您发现一个查询使用了次优查询计划,我建议首先尝试重写或重新构建查询,看看优化器是否会选择更好的查询计划。此外,至少对于 innodb,请确保不仅仅是您的索引统计信息已过时(ANALYZE TABLE)。这会导致优化器选择一个糟糕的查询计划。优化器提示通常应该是您的最后手段。

Another reason not to use query hints is that your data distribution may change over time, or your index selectivity may change, etc. as your table grows. Your query hints that are optimal now, may become sub-optimal over time. But the optimizer will be unable to adapt the query plan because of your now outdated hints. You stay more flexible if you allow the optimizer to make the decisions.

另一个不使用查询提示的原因是你的数据分布可能会随着时间的推移而改变,或者你的索引选择性可能会随着你的表的增长而改变等等。您现在最佳的查询提示可能会随着时间的推移变得次优。但是由于您现在已经过时的提示,优化器将无法调整查询计划。如果您允许优化器做出决定,您将保持更灵活。

回答by jjclarkson

From MySQL JOIN reference:

来自MySQL JOIN 参考

"STRAIGHT_JOIN is similar to JOIN, except that the left table is always read before the right table. This can be used for those (few) cases for which the join optimizer puts the tables in the wrong order."

“STRAIGHT_JOIN 类似于 JOIN,不同之处在于始终在读取右表之前读取左表。这可用于连接优化器将表以错误顺序排列的那些(少数)情况。”

回答by Barry Kelly

Here's a scenario that came up just recently at work.

这是最近在工作中出现的一个场景。

Consider three tables, A, B, C.

考虑三个表,A、B、C。

A has 3,000 rows; B has 300,000,000 rows; and C has 2,000 rows.

A 有 3,000 行;B 有 300,000,000 行;C 有 2,000 行。

Foreign keys are defined: B(a_id), B(c_id).

外键定义为:B(a_id)、B(c_id)。

Suppose you had a query that looks like this:

假设您有一个如下所示的查询:

select a.id, c.id
from a
join b on b.a_id = a.id
join c on c.id = b.c_id

In my experience, MySQL may choose to go C -> B -> A in this case. C is smaller than A and B is enormous, and they're all equijoins.

根据我的经验,在这种情况下,MySQL 可能会选择 C ​​-> B -> A。C 比 A 小,B 是巨大的,它们都是等值的。

The trouble is MySQL doesn't necessarily take into account the size of the intersection between (C.id and B.c_id) vs (A.id and B.a_id). If the join between B and C returns just as many rows as B, then it's a very poor choice; if starting with A would have filtered down B to as many rows as A, then it would have been a much better choice. straight_joincould be used to force this order like this:

问题是 MySQL 不一定考虑(C.id 和 B.c_id)与(A.id 和 B.a_id)之间的交集的大小。如果 B 和 C 之间的连接返回的行数与 B 一样多,那么这是一个非常糟糕的选择;如果从 A 开始会将 B 过滤到与 A 一样多的行,那么它会是一个更好的选择。straight_join可用于强制执行此命令,如下所示:

select a.id, c.id
from a
straight_join b on b.a_id = a.id
join c on c.id = b.c_id

Now amust be joined on before b.

现在a必须加入之前b

Generally you want to do your joins in an order that minimizes the number of rows in the resulting set. So starting with a small table and joining such that the resulting join will also be small, is ideal. Things go pear-shaped if starting with a small table and joining it to a bigger table ends up just as large as the big table.

通常,您希望按照最小化结果集中行数的顺序进行连接。因此,从一个小表开始并加入使得结果加入也很小,这是理想的。如果从一张小桌子开始,然后将它连接到一张大桌子,最终结果与大桌子一样大,事情就会变成梨形。

It's stats dependent though. If the data distribution changes, the calculation may change. It's also dependent on the implementation details of the join mechanism.

虽然它依赖于统计数据。如果数据分布发生变化,计算可能会发生变化。它还取决于连接机制的实现细节。

The worst cases that I've seen for MySQL that all but required straight_joinor aggressive index hinting are queries that paginate over a lot of data in a strict sort order with light filtering. MySQL strongly prefers to use indexes for any filters and joins over sorts; this makes sense because most people aren't trying to sort the whole database but rather have a limited subset of rows that are responsive to the query, and sorting a limited subset is much faster than filtering the whole table, no matter whether it's sorted or not. In this case, putting straight join immediately after the table that had the indexed column I wanted to sort on fixed things.

我在 MySQL 中看到的最糟糕的情况是,除了必需的straight_join或积极的索引提示之外,所有查询都是以严格的排序顺序对大量数据进行分页,并进行轻度过滤的查询。MySQL 强烈倾向于对任何过滤器和连接使用索引而不是排序;这是有道理的,因为大多数人并不试图对整个数据库进行排序,而是对响应查询的行的有限子集进行排序,并且对有限子集进行排序比过滤整个表要快得多,无论它是排序的还是不是。在这种情况下,在我想对固定内容进行排序的具有索引列的表之后立即放置直接连接。

回答by IAdapter

MySQL isn't necessarilly good at choosing the join order in complex queries. By specifying a complex query as a straight_join the query executes the joins in the order they're specified. By placing the table to be the least common denominator first and specifying straight_join you are able to improve the query performance.

MySQL 不一定擅长在复杂查询中选择连接顺序。通过将复杂查询指定为直接连接,查询将按照指定的顺序执行连接。通过将表置于最小公分母的位置并指定直接连接,您可以提高查询性能。

回答by Mitendra

STRAIGHT_JOIN, using this clause, you can control the JOINorder: which table is scanned in the outer loop and which one is in the inner loop.

STRAIGHT_JOIN,使用这个子句,可以控制JOIN顺序:外循环扫描哪个表,内循环扫描哪个表。

回答by Nicolas Thery

I will tell you why I had to use STRAIGHT_JOIN :

我会告诉你为什么我必须使用 STRAIGHT_JOIN :

  • I had a performanceissue with a query.
  • Simplifying the query, the query was suddently more efficient
  • Trying to figure out which specific part was bringing the issue, I just couldn't. (2 left joins together were slow, and each one was independently fast)
  • I then executed the EXPLAIN with both slow and fast query (addind one of the left joins)
  • Surprisingly, MySQL changed entirely the JOIN orders between the 2 queries.
  • 我在查询时遇到了性能问题。
  • 简化查询,查询效率一下子提高了
  • 试图找出是哪个特定部分带来了问题,我就是做不到。(2个左连接在一起很慢,每个独立的快)
  • 然后我用慢速和快速查询执行了 EXPLAIN(添加左连接之一)
  • 令人惊讶的是,MySQL 完全改变了 2 个查询之间的 JOIN 顺序。

Therefore I forced one of the joins to be straight_join to FORCE the previous join to be read first. This prevented MySQL to change the execution order and worked like a charm !

因此,我强制其中一个连接是直接连接,以强制首先读取前一个连接。这阻止了 MySQL 更改执行顺序并像魅力一样工作!

回答by Rick James

If your query ends with ORDER BY... LIMIT..., it maybe optimal to reformulate the query to trick the optimizer into doing the LIMITbeforethe JOIN.

如果你的查询结束ORDER BY... LIMIT...,它可能是最佳的重新制定查询优化器诱骗做LIMIT之前JOIN

(This Answer does not apply only to the original question about STRAIGHT_JOIN, nor does it apply to all cases of STRAIGHT_JOIN.)

(此答案不仅适用于关于 的原始问题STRAIGHT_JOIN,也不适用于 的所有情况STRAIGHT_JOIN。)

Starting with the example by @Accountant?, this should run faster in most situations. (And it avoids needing hints.)

@Accountant例子开始?,这在大多数情况下应该运行得更快。(并且它避免了需要提示。)

SELECT  whatever
    FROM  ( SELECT id FROM sales
                ORDER BY  date, id
                LIMIT  50
          ) AS x
    JOIN  sales   ON sales.id = x.id
    JOIN  stores  ON sales.storeId = stores.id
    ORDER BY  sales.date, sales.id;

Notes:

笔记:

  • First, 50 ids are fetched. This will be especially fast with INDEX(date, id).
  • Then the join back to saleslets you get only 50 "whatevers" withouthauling them around in a temp table.
  • since a subquery is, by definition, unordered, the ORDER BYmust be repeated in the outer query. (The Optimizer may find a way to avoid actually doing another sort.)
  • Yes, it is messier. But it is usually faster.
  • 首先,获取 50 个 ID。使用INDEX(date, id).
  • 然后加入回sales让您只获得 50 个“随便什么”,而无需将它们拖到临时表中。
  • 由于子查询根据定义是无序的,因此ORDER BY必须在外部查询中重复。(优化器可能会找到一种方法来避免实际执行另一种排序。)
  • 是的,它更混乱。但它通常更快。

I am opposed to using hits because "Even if it is faster today, it may fail to be faster tomorrow."

我反对使用命中,因为“即使今天更快,明天也可能不会更快。”

回答by Accountant ?

In my short experience, one of the situations that STRAIGHT_JOINhas reduced my query from 30 seconds to 100 milliseconds is that the first table in the execution plan was not the table that has the order by columns

在我的短暂经验中,STRAIGHT_JOIN将我的查询从 30 秒减少到 100 毫秒的情况之一是执行计划中的第一个表不是按列排序的表

-- table sales (45000000) rows
-- table stores (3) rows
SELECT whatever
FROM 
    sales 
    INNER JOIN stores ON sales.storeId = stores.id
ORDER BY sales.date, sales.id 
LIMIT 50;
-- there is an index on (date, id)

IF the optimizer chooses to hit storesfirstit will cause Using index; Using temporary; Using filesortbecause

如果优化选择打stores第一会引起Using index; Using temporary; Using filesort

if the ORDER BY or GROUP BY contains columns from tables other than the first table in the join queue, a temporary table is created.

如果 ORDER BY 或 GROUP BY 包含来自连接队列中第一个表以外的表的列,则会创建一个临时表。

source

来源

here the optimizer needs a little help by telling him to hit salesfirst using

这里优化器需要一点帮助,告诉他sales首先使用

sales STRAIGHT_JOIN stores

回答by rai

I know it's a bit old but here's a scenario, I've been doing batch script to populate a certain table. At some point, the query ran very slow. It appears that the join order was incorrect on particular records:

我知道它有点旧,但这里有一个场景,我一直在做批处理脚本来填充某个表。在某些时候,查询运行得非常慢。似乎特定记录的连接顺序不正确:

  • In correct order
  • 按正确顺序

enter image description here

在此处输入图片说明

  • Incrementing the id by 1 messes up the order. Notice the 'Extra' field
  • 将 id 增加 1 会弄乱订单。注意“额外”字段

enter image description here

在此处输入图片说明

  • Using straight_join fixes the issue
  • 使用直接连接解决了这个问题

enter image description here

在此处输入图片说明

Incorrect order runs for about 65 secs while using straight_join runs in milliseconds

不正确的顺序运行约 65 秒,而使用直接连接以毫秒为单位运行

回答by lhs295988029

--use 120s, 18 million data
    explain SELECT DISTINCT d.taid
    FROM tvassist_recommend_list_everyday_diverse d, tvassist_taid_all t
    WHERE d.taid = t.taid
      AND t.client_version >= '21004007'
      AND t.utdid IS NOT NULL
      AND d.recommend_day = '20170403'
    LIMIT 0, 10000

--use 3.6s repalce by straight join
 explain SELECT DISTINCT d.taid
    FROM tvassist_recommend_list_everyday_diverse d
    STRAIGHT_JOIN 
      tvassist_taid_all t on d.taid = t.taid 
    WHERE 
     t.client_version >= '21004007'
       AND d.recommend_day = '20170403'

      AND t.utdid IS NOT NULL  
    LIMIT 0, 10000