postgresql 更详细地解释 JOIN 与 LEFT JOIN 和 WHERE 条件性能建议

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24876673/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 01:33:16  来源:igfitidea点击:

Explain JOIN vs. LEFT JOIN and WHERE condition performance suggestion in more detail

sqlpostgresqljoinleft-joinwhere

提问by Dwayne Towell

In this candidate answerit is asserted that JOINis better than LEFT JOINunder some circumstances involving some WHEREclauses because it does not confuse the query planner and is not "pointless". The assertion/assumption is that it should be obvious to anyone.

这个候选答案中,它被断言JOINLEFT JOIN在涉及某些WHERE子句的某些情况下更好,因为它不会混淆查询规划器并且不是“毫无意义”。断言/假设对任何人都应该是显而易见的。

Please explain further or provide link(s) for further reading.

请进一步解释或提供链接以供进一步阅读。

回答by Erwin Brandstetter

Effectively, WHEREconditions and JOINconditions for [INNER] JOINare 100 % equivalent in PostgreSQL. (It's good practice to use explicit JOINconditions to make queries easier to read and maintain, though).

实际上,WHERE条件和JOIN条件[INNER] JOIN在 PostgreSQL 中是 100% 等价的。(不过,使用显式JOIN条件使查询更易于阅读和维护是一种很好的做法)。

The same is nottrue for a LEFT JOINcombined with a WHEREcondition on a table to the right of the join. The purpose of a LEFT JOINis to preserve all rows on the left side of the join, irregardless of a match on the right side. If no match is found, the row is extended with NULLvalues for columns on the right side. The manual:

对于与连接右侧的表上的条件相结合的情况,情况并非如此。a 的目的是保留连接左侧的所有行,而不管右侧是否匹配。如果未找到匹配项,则使用右侧列的值扩展该行。手册:LEFT JOINWHERELEFT JOINNULL

LEFT OUTER JOIN

First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Thus, the joined table always has at least one row for each row in T1.

LEFT OUTER JOIN

首先,执行内连接。然后,对于 T1 中不满足与 T2 中任何行的连接条件的每一行,在 T2 的列中添加一个带有空值的连接行。因此,连接表对于 T1 中的每一行总是至少有一行。

If you then apply a WHEREcondition on columns of tables on the right side, you void the effect and forcibly convert the LEFT JOINto work like a plain JOIN, just more expensively due to a more complicated query plan.

如果然后WHERE在右侧的表列上应用条件,则会使效果无效并强制将 转换LEFT JOIN为像普通 一样工作JOIN,只是由于更复杂的查询计划而导致成本更高。

In a query with many joined tables, Postgres (or any RDBMS) is hard put to it to find the best (or even a good) query plan. The number of theoretically possible sequences to join tables grows factorially(!). Postgres uses the "Generic Query Optimizer"for the task and there are some settings to influence it.

在具有许多连接表的查询中,Postgres(或任何 RDBMS)很难找到最佳(甚至是好的)查询计划。连接表的理论上可能的序列数量按因子增长(!)。Postgres为任务使用“通用查询优化器”,并且有一些设置会影响它。

Obfuscating the query with misleading LEFT JOINas outlined, makes the work of the query planner harder, is misleading for human readers and typically hints at errors in the query logic.

LEFT JOIN如概述的那样用误导性的方式混淆查询,会使查询规划器的工作变得更加困难,对人类读者来说是一种误导,并且通常会暗示查询逻辑中的错误。

Many related answers for problems stemming from this:

对于由此产生的问题的许多相关答案:

Etc.

等等。

回答by Brian DeMilia

Consider the following example. We have two tables, DEPARTMENTS and EMPLOYEES.

考虑以下示例。我们有两个表,DEPARTMENTS 和 EMPLOYEES。

Some departments do not yet have any employees.

一些部门还没有任何员工。

This query uses an inner join that finds the department employee 999 works at, if any, otherwise it shows nothing (not even the employee or his or her name):

此查询使用内部联接查找部门员工 999 工作的部门(如果有),否则将不显示任何内容(甚至不显示员工或他或她的姓名):

select a.department_id, a.department_desc, b.employee_id, b.employee_name
  from departments a
  join employees b
    on a.department_id = b.department_id
 where b.employee_id = '999'

This next query uses an outer join (left between departments and employees) and finds the department that employee 999 works for. However it too will not show the employee's ID or his or her name, if they do not work at any departments. That is because of the outer joined table being used in the WHERE clause. If there is no matching department, it will be null (not 999, even though 999 exists in employees).

下一个查询使用外部联接(在部门和员工之间的左侧)并查找员工 999 所在的部门。但是,如果员工不在任何部门工作,它也不会显示员工的 ID 或他或她的姓名。这是因为在 WHERE 子句中使用了外部连接表。如果没有匹配的部门,它将为空(不是 999,即使 999 存在于员工中)。

select a.department_id, a.department_desc, b.employee_id, b.employee_name
  from departments a
  left join employees b
    on a.department_id = b.department_id
 where b.employee_id = '999'

But consider this query:

但是考虑这个查询:

select a.department_id, a.department_desc, b.employee_id, b.employee_name
  from departments a
  left join employees b
    on a.department_id = b.department_id
   and b.employee_id= '999'

Now the criteria is in the on clause. So even if this employee works at no departments, he will still be returned (his ID and name). The department columns will be null, but we get a result (the employee side).

现在标准在 on 子句中。所以即使这个员工没有在任何部门工作,他仍然会被退回(他的身和名字)。部门列将为空,但我们得到一个结果(员工方面)。

You might think you would never want to use the outer joined table in the WHERE clause, but that is not necessarily the case. Normally it is, for the reason described above, though.

您可能认为您永远不想在 WHERE 子句中使用外部连接表,但事实并非一定如此。但是,由于上述原因,通常是这样。

Suppose you want all departments with no employees. Then you could run the following, which does use an outer join, and the outer joined table is used in the where clause:

假设您希望所有部门都没有员工。然后您可以运行以下命令,它确实使用了外连接,并且在 where 子句中使用了外连接表:

select a.department_id, a.department_desc, b.employee_id
  from departments a
  left join employees b
    on a.department_id = b.department_id
 where b.employee_id is null

^^ Shows departments with no employees.

^^ 显示没有员工的部门。

The above is likely the only legitimate reason you would want to use an outer joined table in the WHERE clause rather than the ON clause (which I think is what your question is; the difference between inner and outer joins is an entirely different topic).

以上可能是您希望在 WHERE 子句而不是 ON 子句中使用外连接表的唯一合法原因(我认为这就是您的问题;内连接和外连接之间的区别是一个完全不同的主题)。

A good way to look at is this: You use outer joins to allow nulls. Why would you then use an outer join and say that a field should not be null and should be equal to 'XYZ'? If a value has to be 'XYZ' (not null), then why instruct the database to allow nulls to come back? It's like saying one thing and then overriding it later.

一个很好的查看方式是:您使用外连接来允许空值。为什么你会使用外连接并说一个字段不应该为空并且应该等于'XYZ'?如果一个值必须是 'XYZ'(非空),那么为什么要指示数据库允许返回空值呢?这就像说一件事,然后再覆盖它。