MySQL sql 连接为维恩图

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13997365/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 15:52:45  来源:igfitidea点击:

sql joins as venn diagram

mysqlsqlvisualizationset-theory

提问by BrainLikeADullPencil

I've had trouble understanding joins in sql and came upon this image which I think might help me. The problem is that I don't fully understand it. For example, the join in the top right corner of the image, which colors the full B circle red and but only the overlap from A. The image makes it seem like circle B is the primary focus of the sql statement, but the sql statement itself, by starting with A (select from A, join B), conveys the opposite impression to me, namely that A would be the focus of the sql statement.

我在理解 sql 中的连接时遇到了麻烦,并发现了这张我认为可能对我有帮助的图像。问题是我不完全理解它。例如,图像右上角的连接,将整个 B 圆圈涂成红色,但仅与 A 重叠。图像使圆形 B 看起来是 sql 语句的主要焦点,但 sql 语句本身,从 A 开始(从 A 中选择,加入 B),给我传达了相反的印象,即 A 将成为 sql 语句的焦点。

Similarly, the image below that only includes data from the B circle, so why is A included at all in the join statement?

同理,下图只包含了B圈的数据,那么为什么join语句中根本就包含了A呢?

Question: Working clockwise from the top right and finishing in the center, can someone provide more information about the representation of each sql image, explaining

问题:从右上方顺时针旋转到中心完成,有人可以提供更多关于每个sql图像表示的信息,解释一下

a) why a join would be necessary in each case (for example, especially in situations where no data's taken from A or B i.e. where only A or B but not both is colored)

a) 为什么在每种情况下都需要连接(例如,特别是在没有从 A 或 B 获取数据的情况下,即只有 A 或 B 但不是两者都着色的情况下)

b) and any other detail that would clarify why the image is a good representation of the sql

b) 和任何其他细节,可以阐明为什么图像是 sql 的良好表示

sql join diagram

sql连接图

采纳答案by ruakh

I think your main underlying confusion is that when (for example) only Ais highlighted in red, you're taking that to mean "the query only returns data from A", but in fact it means "the query only returns data for those cases where Ahas a record". The query might still contain data from B. (For cases where Bdoes nothave a record, the query will substitute NULL.)

我觉得你的主要潜在的困惑是,当(例如)只A以红色突出显示,你正在做的是指“查询只返回数据来自A”,但实际上它的意思是”查询只返回数据的那些情况下,A有记录”。查询可能仍包含从B.数据(对于那些情况下B不会有记录,查询将取代NULL。)

Similarly, the image below that only includes data from the B circle, so why is A included at all in the join statement?

同理,下图只包含了B圈的数据,那么为什么join语句中根本就包含了A呢?

If you mean — the image where Ais entirely in white, and there's a red crescent-shape for the part of Bthat doesn't overlap with A, then: the reason that Aappears in the query is, Ais how it finds the records in Bthat need to be excluded. (If Adidn't appear in the query, then Venn diagram wouldn't have A, it would only show B, and there'd be no way to distinguish the desired records from the unwanted ones.)

如果您的意思是 - 图像A完全是白色的,并且有一个红新月形状的部分B不与 重叠A,那么:A出现在查询中的原因A是它如何找到B需要的记录被排除在外。(如果A没有出现在查询中,那么维恩图就没有A,它只会显示B,并且无法区分所需的记录和不需要的记录。)

The image makes it seem like circle B is the primary focus of the sql statement, but the sql statement itself, by starting with A (select from A, join B), conveys the opposite impression to me, namely that A would be the focus of the sql statement.

该图像使圆形 B 似乎是 sql 语句的主要焦点,但 sql 语句本身以 A 开头(从 A 中选择,加入 B)向我传达了相反的印象,即 A 将是焦点sql 语句。

Quite right. For this reason, RIGHT JOINs are relatively uncommon; although a query that uses a LEFT JOINcan nearly always be re-ordered to use a RIGHT JOINinstead (and vice versa), usually people will write their queries with LEFT JOINand not with RIGHT JOIN.

非常正确。因此,RIGHT JOINs 相对不常见;尽管使用 a 的查询LEFT JOIN几乎总是可以重新排序以使用 aRIGHT JOIN代替(反之亦然),但通常人们会使用LEFT JOIN而不是编写查询RIGHT JOIN

回答by Martin Smith

I agree with Cade about the limitations of Venn diagrams here. A more apposite visual representation might be this.

我同意凯德关于这里维恩图的局限性。更合适的视觉表示可能是这样。

Tables

Tables

表

SELECT A.Colour, B.Colour FROM A CROSS JOIN B SQL Fiddle

SELECT A.Colour, B.Colour FROM A CROSS JOIN B SQL Fiddle

The cross join (or cartesian product) produces a result with every combination of the rows from the two tables. Each table has 4 rows so this produces 16 rows in the result.

交叉连接(或笛卡尔乘积)产生两个表中行的每个组合的结果。每个表有 4 行,因此这会在结果中产生 16 行。

Cross Join

交叉连接

SELECT A.Colour, B.Colour FROM A INNER JOIN B ON A.Colour = B.Colour SQL Fiddle

SELECT A.Colour, B.Colour FROM A INNER JOIN B ON A.Colour = B.Colour SQL Fiddle

The inner join logically returns all rows from the cross join that match the join condition. In this case five do.

内连接在逻辑上返回交叉连接中与连接条件匹配的所有行。在这种情况下,五个做。

Inner Join

内部联接

SELECT A.Colour, B.Colour FROM A INNER JOIN B ON A.Colour NOT IN ('Green','Blue') SQL Fiddle

SELECT A.Colour, B.Colour FROM A INNER JOIN B ON A.Colour NOT IN ('Green','Blue') SQL Fiddle

The inner join condition need not necessarily be an equality condition and it need not reference columns from both (or even either) of the tables. Evaluating A.Colour NOT IN ('Green','Blue')on each row of the cross join returns.

内连接条件不一定是相等条件,也不需要引用来自两个(或什至任一)表的列。A.Colour NOT IN ('Green','Blue')对交叉联接返回的每一行进行评估。

inner 2

内部 2

An inner join condition of 1=1would evaluate to true for every row in the cross join so the two are equivalent (SQL Fiddle).

1=1对于交叉连接中的每一行,内部连接条件 of将评估为真,因此两者是等效的(SQL Fiddle)。

SELECT A.Colour, B.Colour FROM A LEFT OUTER JOIN B ON A.Colour = B.Colour SQL Fiddle

SELECT A.Colour, B.Colour FROM A LEFT OUTER JOIN B ON A.Colour = B.Colour SQL Fiddle

Outer Joins are logically evaluated in the same way as inner joins except that if a row from the left table (for a left join) does not join with any rows from the right hand table at all it is preserved in the result with NULLvalues for the right hand columns.

外连接的逻辑评估方式与内连接相同,不同之处在于如果左表中的一行(对于左连接)根本不与右手表中的任何行连接,它会保留在结果中,其NULL值为右手边的列。

LOJ

LOJ

SELECT A.Colour, B.Colour FROM A LEFT OUTER JOIN B ON A.Colour = B.Colour WHERE B.Colour IS NULL SQL Fiddle

SELECT A.Colour, B.Colour FROM A LEFT OUTER JOIN B ON A.Colour = B.Colour WHERE B.Colour IS NULL SQL Fiddle

This simply restricts the previous result to only return the rows where B.Colour IS NULL. In this particular case these will be the rows that were preserved as they had no match in the right hand table and the query returns the single red row not matched in table B. This is known as an anti semi join.

这只是将先前的结果限制为仅返回 where 的行B.Colour IS NULL。在这种特殊情况下,这些将是保留的行,因为它们在右侧表中没有匹配项,并且查询返回表中不匹配的单个红色行B。这被称为反半连接。

It is important to select a column for the IS NULLtest that is either not nullable or for which the join condition ensures that any NULLvalues will be excluded in order for this pattern to work correctly and avoid just bringing back rows which happen to have a NULLvalue for that column in addition to the un matched rows.

IS NULL测试选择一列不可为空或连接条件确保NULL将排除任何值以便此模式正常工作并避免仅带回恰好具有该NULL值的行很重要除了未匹配的行之外的列。

loj is null

loj 为空

SELECT A.Colour, B.Colour FROM A RIGHT OUTER JOIN B ON A.Colour = B.Colour SQL Fiddle

SELECT A.Colour, B.Colour FROM A RIGHT OUTER JOIN B ON A.Colour = B.Colour SQL Fiddle

Right outer joins act similarly to left outer joins except they preserve non matching rows from the right table and null extend the left hand columns.

右外连接的作用类似于左外连接,除了它们保留右表中不匹配的行并且 null 扩展左手列。

ROJ

注册会计师

SELECT A.Colour, B.Colour FROM A FULL OUTER JOIN B ON A.Colour = B.Colour SQL Fiddle

SELECT A.Colour, B.Colour FROM A FULL OUTER JOIN B ON A.Colour = B.Colour SQL Fiddle

Full outer joins combine the behaviour of left and right joins and preserve the non matching rows from both the left and the right tables.

全外连接结合了左右连接的行为,并保留了左右表中不匹配的行。

FOJ

FOJ

回答by Cade Roux

Venn diagrams are suitable for representing set operations such as UNION, INTERSECTS, EXCEPT etc.

维恩图适用于表示集合操作,例如 UNION、INTERSECTS、EXCEPT 等。

To the extent only that those set operations like EXCEPT are simulated with things like LEFT JOIN WHERE rhs.KEY is NULL, this diagram is accurate.

仅就诸如 EXCEPT 之类的集合操作使用 LEFT JOIN WHERE rhs.KEY 为 NULL 之类的东西进行模拟的程度而言,此图是准确的。

Otherwise it is misleading. For instance, any join can cause rows to multiply if the join criteria are not 1:1. But sets are only allowed to contain distinct members, so those cannot be represented as set operations.

否则就是误导。例如,如果连接条件不是 1:1,任何连接都可能导致行相乘。但是集合只允许包含不同的成员,所以那些不能表示为集合操作。

Then there is the CROSS JOIN or INNER JOIN ON 1 = 1 - this is neither analogous to the INNER JOIN as shown in this diagram, nor can the set which is produced be really described by a Venn diagram. Not to mention all the other possible triangular joins, self and anti-joins like:

然后是 CROSS JOIN 或 INNER JOIN ON 1 = 1 - 这既不像图中所示的 INNER JOIN,也不能用维恩图来真正描述所产生的集合。更不用说所有其他可能的三角形连接,自连接和反连接,例如:

lhs INNER JOIN rhs ON rhs.VALUE < lhs.VALUE (triangular)

or

或者

SELF self1
INNER JOIN SELF self2
    ON self2.key <> self1.key
    AND self1.type = self2.type

(self cross and anti-join to find all similar family members except yourself - self1 and self2 are the same set and the result is a proper subset)

(自我交叉和反加入以查找除自己之外的所有相似家庭成员-self1 和self2 是相同的集合,结果是一个适当的子集)

Sticking to joins on keys may be fine for the first few minutes of a tutorial, but this can lead to a poor path for learning what joins are about. I think this is what you have found.

在教程的前几分钟坚持连接键可能没问题,但这可能导致学习连接的含义很差。我想这就是你发现的。

This idea that Venn Diagrams can represent JOINs generally this way needs to go away.

维恩图通常可以通过这种方式表示 JOIN 的想法需要消失。

回答by zebediah49

When you do a join, it is likely that your two tables might not match up perfectly. Specifically, there could be some rows in A that don't match up to anything in B, or duplicate rows in A that match up with a single row in B, and vice-versa.

当您进行连接时,您的两个表可能不会完美匹配。具体来说,A 中的某些行可能与 B 中的任何行都不匹配,或者 A 中的重复行与 B 中的单行匹配,反之亦然。

When this happens, you have a choice:

发生这种情况时,您有一个选择:

  1. for each A, take a single B that works, if there is one. (upper left)
  2. take each pair that fully matches (discard any that are missing either A or B--center)
  3. for each B, take a single A that works, if there is one (upper right)
  4. take EVERYTHING (lower left)
  1. 对于每个 A,如果有的话,取一个有效的 B。(左上)
  2. 取出完全匹配的每一对(丢弃任何缺少 A 或 B 的 - 中心)
  3. 对于每个 B,取一个有效的 A,如果有的话(右上)
  4. 采取一切(左下)

Center left and right are technically joins, but pointless ones; they could probably be more efficiently written SELECT <select_list> FROM TableA A WHERE A.Key NOT IN (SELECT B.Key FROM TableB B)(or the opposite).

中左和中右是技术上的连接,但毫无意义;它们可能会更有效地编写SELECT <select_list> FROM TableA A WHERE A.Key NOT IN (SELECT B.Key FROM TableB B)(或相反)。

In direct answer to your confusion, RIGHT JOINsays "the following expression is the focus of this query". Lower right is rather strange, and I see no reason why you would want that. It returns the results from the two outer middle queries, mixed together with NULL's in all of the columns for the opposite table.

直接回答您的困惑,RIGHT JOIN说“以下表达式是此查询的重点”。右下角很奇怪,我看不出你为什么想要那样。它返回来自两个外部中间查询的结果,并与NULL相反表的所有列中的 's混合在一起。

回答by Mike Mooney

For the right join, yes the syntax can be confusing, but yes it is what it seems to be. When you say "TableA RIGHT JOIN TableB", it is indeed saying that TableB is the main table that you are referring to and TableA is just hanging on where it has matching records. This does read weird in queries, because TableA is listed first so your brain automatically assigns more priority to it, even though TableB is really the more important table in the query. For this reason, you rarely actually see right joins in real code.

对于正确的连接,是的,语法可能会令人困惑,但是它看起来就是这样。当您说“TableA RIGHT JOIN TableB”时,实际上是在说 TableB 是您所指的主表,而 TableA 只是挂在它具有匹配记录的位置。这在查询中读起来很奇怪,因为 TableA 排在最前面,所以你的大脑会自动为它分配更多的优先级,即使 TableB 确实是查询中更重要的表。出于这个原因,您很少会在实际代码中看到正确的连接。

So instead of A and B, lets take two things that are easy to keep track of. Supposed we have two tables for people's info, ShoeSize and IQ. You have ShoeSize info for some people, some IQ info for some people. And have a PersonID on both tables that you can join on.

因此,而不是 A 和 B,让我们采取两件容易跟踪的事情。假设我们有两个表用于人们的信息,ShoeSize 和 IQ。你有一些人的 ShoeSize 信息,一些人的 IQ 信息。并且在您可以加入的两个表上都有一个 PersonID。

Clockwise from top right (even tho this starts with some of the more complicated and contrived cases):

从右上角开始顺时针(即使是从一些更复杂和人为的案例开始):

  • ShoeSize RIGHT JOIN IQ -> give me all of the IQ information. Include any ShoeSize information for those people if we have it.
  • ShoeSize RIGHT JOIN IQ WHERE ShowSize.PersonID = NULL -> Give me all of the IQ info, but only for people who don't have any shoe size info
  • ShoeSize FULL OUTER JOIN IQ WHERE ShoeSize.PersonID = NULL AND IQ.PersonID = NULL -> Give me the shoe size info only for people who don't have IQ info, plus the IQ info for people who don't have shoe size info
  • ShoeSize FULL OUTER JOIN IQ -> Give me everything, all shoe sizes and all IQ data. If any ShoeSizes and IQ records have the same PersonID, include them in one row.
  • ShoeSize LEFT JOIN IQ WHERE IQ.PersonID = NULL -> Give me all of the shoe size info, but only for people that don't have IQ info
  • ShoeSize LEFT JOIN IQ -> Give me all of the shoe size info. Include any IQ information for those people if we have it.
  • ShoeSize RIGHT JOIN IQ -> 给我所有的 IQ 信息。如果我们有这些人的任何 ShoeSize 信息,请包括它。
  • ShoeSize RIGHT JOIN IQ WHERE ShowSize.PersonID = NULL -> 给我所有的 IQ 信息,但仅适用于没有任何鞋码信息的人
  • ShoeSize FULL OUTER JOIN IQ WHERE ShoeSize.PersonID = NULL AND IQ.PersonID = NULL -> 只给我没有 IQ 信息的人的鞋码信息,以及没有鞋码信息的人的 IQ 信息
  • ShoeSize FULL OUTER JOIN IQ -> 给我一切,所有鞋码和所有 IQ 数据。如果任何 ShoeSizes 和 IQ 记录具有相同的 PersonID,请将它们包含在一行中。
  • ShoeSize LEFT JOIN IQ WHERE IQ.PersonID = NULL -> 给我所有的鞋码信息,但仅限于没有 IQ 信息的人
  • ShoeSize LEFT JOIN IQ -> 给我所有的鞋码信息。如果我们有这些人的任何 IQ 信息,请包括这些信息。