了解涉及 3 个或更多表时 JOIN 的工作原理。[SQL]
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1083676/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Understanding how JOIN works when 3 or more tables are involved. [SQL]
提问by Delaney
I wonder if anyone can help improve my understanding of JOINs in SQL. [If it is significant to the problem, I am thinking MS SQL Server specifically.]
我想知道是否有人可以帮助我提高对 SQL 中 JOIN 的理解。[如果它对问题很重要,我会特别考虑 MS SQL Server。]
Take 3 tables A, B [A related to B by some A.AId], and C [B related to C by some B.BId]
取 3 个表 A、B [A 通过一些 A.Aid 与 B 相关] 和 C [B 通过一些 B.BId 与 C 相关]
If I compose a query e.g
如果我编写一个查询,例如
SELECT *
FROM A JOIN B
ON A.AId = B.AId
All good - I'm sweet with how this works.
一切都很好 - 我对它的工作方式很满意。
What happens when Table C (Or some other D,E, .... gets added)
当表 C(或其他一些 D、E、.... 被添加)时会发生什么
In the situation
在这种情况下
SELECT *
FROM A JOIN B
ON A.AId = B.AId
JOIN C ON C.BId = B.BId
What is C joining to? - is it that B table (and the values therein)? Or is it some other temporary result set that is the result of the A+B Join that the C table is joined to?
C加入了什么?- 是那个 B 表(以及其中的值)吗?还是其他一些临时结果集是 C 表所连接的 A+B 连接的结果?
[The implication being not all values that are in the B table will necessarily be in the temporary result set A+B based on the join condition for A,B]
[这意味着并非 B 表中的所有值都必须在临时结果集 A+B 中,基于 A,B 的连接条件]
A specific (and fairly contrived) example of why I am asking is because I am trying to understand behaviour I am seeing in the following:
我为什么要问的一个具体(并且相当人为的)示例是因为我试图理解我在以下内容中看到的行为:
Tables
Account (AccountId, AccountBalanceDate, OpeningBalanceId, ClosingBalanceId)
Balance (BalanceId)
BalanceToken (BalanceId, TokenAmount)
Where:
Account->Opening, and Closing Balances are NULLABLE
(may have opening balance, closing balance, or none)
Balance->BalanceToken is 1:m - a balance could consist of many tokens
Conceptually, Closing Balance of a date, would be tomorrows opening balance
从概念上讲,日期的期末余额将是明天的期初余额
If I was trying to find a list of all the opening and closing balances for an account
如果我试图找到一个账户的所有期初和期末余额的列表
I might do something like
我可能会做类似的事情
SELECT AccountId
, AccountBalanceDate
, Sum (openingBalanceAmounts.TokenAmount) AS OpeningBalance
, Sum (closingBalanceAmounts.TokenAmount) AS ClosingBalance
FROM Account A
LEFT JOIN BALANCE OpeningBal
ON A.OpeningBalanceId = OpeningBal.BalanceId
LEFT JOIN BALANCE ClosingBal
ON A.ClosingBalanceId = ClosingBal.BalanceId
LEFT JOIN BalanceToken openingBalanceAmounts
ON openingBalanceAmounts.BalanceId = OpeningBal.BalanceId
LEFT JOIN BalanceToken closingBalanceAmounts
ON closingBalanceAmounts.BalanceId = ClosingBal.BalanceId
GROUP BY AccountId, AccountBalanceDate
Things work as I would expect until the last JOIN brings in the closing balance tokens - where I end up with duplicates in the result.
事情按照我的预期工作,直到最后一个 JOIN 引入期末余额令牌 - 我最终在结果中得到重复。
[I can fix with a DISTINCT - but I am trying to understand why what is happening is happening]
[我可以用 DISTINCT 修复 - 但我试图理解为什么正在发生的事情正在发生]
I have been told the problem is because the relationship between Balance, and BalanceToken is 1:M - and that when I bring in the last JOIN I am getting duplicates because the 3rd JOIN has already brought in BalanceIds multiple times into the (I assume) temporary result set.
有人告诉我问题是因为 Balance 和 BalanceToken 之间的关系是 1:M - 当我引入最后一个 JOIN 时,我得到了重复项,因为第三个 JOIN 已经多次引入 BalanceIds(我假设)临时结果集。
I know that the example tables do not conform to good DB design
我知道示例表不符合良好的数据库设计
Apologies for the essay, thanks for any elightenment :)
为这篇文章道歉,感谢任何启发:)
Edit in response to question by Marc
编辑回答马克的问题
Conceptually for an account there should not be duplicates in BalanceToken for An Account (per AccountingDate) - I think the problem comes about because 1 Account / AccountingDates closing balance is that Accounts opening balance for the next day - so when self joining to Balance, BalanceToken multiple times to get opening and closing balances I think Balances (BalanceId's) are being brought into the 'result mix' multiple times. If it helps to clarify the second example, think of it as a daily reconciliation - hence left joins - an opening (and/or) closing balance may not have been calculated for a given account / accountingdate combination.
从概念上讲,一个帐户不应该在 BalanceToken 中为一个帐户(每个 AccountingDate)重复 - 我认为问题是因为 1 个帐户 / AccountingDates 期末余额是第二天的帐户期初余额 - 所以当自己加入 Balance,BalanceToken多次获得期初和期末余额我认为余额(BalanceId's)被多次带入“结果组合”。如果它有助于澄清第二个示例,请将其视为每日对帐 - 因此左联接 - 可能尚未为给定的帐户/会计日期组合计算期初(和/或)期末余额。
采纳答案by WW.
Conceptuallyhere is what happens when you join three tables together.
从概念上讲,这是将三个表连接在一起时会发生的情况。
- The optimizer comes up with a plan, which includes a join order. It could be A, B, C, or C, B, A or any of the combinations
- The query execution engine applies any predicates (
WHERE
clause) to the first table that doesn't involve any of the other tables. It selects out the columns mentioned in theJOIN
conditions or theSELECT
list or theORDER BY
list. Call this result A - It joins this result set to the second table. For each row it joins to the second table, applying any predicates that may apply to the second table. This results in another temporary resultset.
- Then it joins in the final table and applies the
ORDER BY
- 优化器提出了一个计划,其中包括一个连接顺序。它可以是 A、B、C 或 C、B、A 或任何组合
- 查询执行引擎将任何谓词(
WHERE
子句)应用于不涉及任何其他表的第一个表。它选择JOIN
条件或SELECT
列表或ORDER BY
列表中提到的列。称这个结果为 A - 它将这个结果集连接到第二个表。对于连接到第二个表的每一行,应用可能适用于第二个表的任何谓词。这会导致另一个临时结果集。
- 然后它加入最终表并应用
ORDER BY
This is conceptually what happens. Infact there are many possible optimizations along the way. The advantage of the relational model is that the sound mathematical basis makes various transformations of plan possible while not changing the correctness.
这就是概念上发生的事情。事实上,在此过程中有许多可能的优化。关系模型的优点是,健全的数学基础使计划的各种变换成为可能,同时不改变正确性。
For example, there is really no need to generate the full result sets along the way. The ORDER BY
may instead be done via accessing the data using an index in the first place. There are lots of types of joins that can be done as well.
例如,实际上没有必要一路生成完整的结果集。在ORDER BY
可以替代地经由访问利用在第一位置的索引数据来完成。还有很多类型的连接可以完成。
回答by Marc Gravell
We know that the data from B
is going to be filtered by the (inner) join to A
(the data in A
is also filtered). So if we (inner) join from B
to C
, thus the set C
is alsofiltered by the relationship to A
. And note also that any duplicates from the join will be included.
我们知道 from 的数据B
将被(内部)join to过滤A
(数据 inA
也被过滤)。因此,如果我们(内部)从加盟B
到C
,这样的设定C
是还通过关系来过滤A
。还要注意,连接中的任何重复项都将被包括在内。
However; what order this happens in is up to the optimizer; it could decide to do the B
/C
join first then introduce A
, or any other sequence (probably based on the estimated number of rows from each join and the appropriate indexes).
然而; 发生这种情况的顺序取决于优化器;它可以决定先执行B
/ C
join 然后引入A
,或任何其他序列(可能基于每个连接的估计行数和适当的索引)。
HOWEVER; in your later example you use a LEFT OUTER
join; so Account
is not filtered at all, and may well my duplicated if any of the other tables have multiple matches.
然而; 在后面的示例中,您使用了LEFT OUTER
连接;所以Account
根本没有过滤,如果任何其他表有多个匹配项,我很可能会重复。
Are there duplicates (per account) in BalanceToken
?
中是否有重复(每个帐户)BalanceToken
?
回答by Bernhard Hofmann
I often find it helps to view the actual execution plan. In query analyser/management studio, you can turn this on for queries from the Query menu, or use Ctrl+M. After running the query, the plan that was executed is shown in another result tab. From this you'll see that C and B are joined first, and then the result is joined with A. The plan might vary depending on information the DBMS has because both joins are inner, making it A-and-B-and-C. What I mean is that the result will be the same regardless of which is joined first, but the time it takes might differ greatly, and this is where the optimiser and hints come into play.
我经常发现查看实际执行计划很有帮助。在查询分析器/管理工作室中,您可以从“查询”菜单或使用 Ctrl+M 为查询打开此选项。运行查询后,执行的计划显示在另一个结果选项卡中。从这里你会看到 C 和 B 首先连接,然后结果与 A 连接。 该计划可能因 DBMS 拥有的信息而异,因为两个连接都是内部连接,使其成为 A-and-B-and-C . 我的意思是,无论哪个先加入,结果都是一样的,但是花费的时间可能会有很大不同,这就是优化器和提示发挥作用的地方。
回答by Console
Joins can be tricky, and much of the behavior is of course dictated by how the data is stored in the actual tables.
联接可能很棘手,而且大部分行为当然取决于数据在实际表中的存储方式。
Without seeing the tables it's hard to give a clear answer in your particular case but I think the basic issue is that you are summing over multiple result sets that are being combined into one.
如果没有看到表格,就很难在您的特定情况下给出明确的答案,但我认为基本问题是您正在对合并为一个的多个结果集进行求和。
Perhaps instead of multiple joins you should make two separate temporary tables in your query, one with the accountID, date and sum of openingbalances, a second one with the accountID, date and sum of closing balances, then joining those two on AccountID and date.
也许您应该在查询中创建两个单独的临时表,而不是多个连接,一个包含帐户 ID、日期和期初余额总和,第二个包含帐户 ID、日期和期末余额总和,然后在 AccountID 和日期上连接这两个表。
In order to find out exactly what is happening with joins, also in your specific case, I would do the following:
为了确切了解连接发生了什么,在您的特定情况下,我将执行以下操作:
Change the initial part
更改初始部分
SELECT accountID Accountbalancedate, sum(...) as openingbalance, sum(...) as closingbalance FROM
SELECT accountID Accountbalancedate, sum(...) as openingbalance, sum(...) as closedbalance FROM
to simply
简单地
"SELECT * FROM"
“选择 * 从”
Study the resulting table, and you will see exactly what data is being duplicated. Remove the joins one by one and see what happens. This should give you a clue to what it is about your particular data that is causing the dupes.
研究结果表,您将准确地看到哪些数据被复制。将连接一一删除,看看会发生什么。这应该会为您提供有关导致欺骗的特定数据的线索。
If you open the query in SQL server management studio (Free version exists) you can edit the query in the designer. The visual view of how the tables are being joined might also help you realize what's going on.
如果您在 SQL server management studio(存在免费版本)中打开查询,您可以在设计器中编辑查询。表如何连接的可视化视图也可能帮助您了解正在发生的事情。