SQL theta 连接、等值连接和自然连接之间的区别
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/7870155/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Difference between a theta join, equijoin and natural join
提问by maclunian
I'm having trouble understanding relational algebra when it comes to theta joins, equijoins and natural joins. Could someone please help me better understand it? If I use the = sign on a theta join is it exactly the same as just using a natural join?
当涉及到 theta 连接、等值连接和自然连接时,我无法理解关系代数。有人可以帮助我更好地理解它吗?如果我在 theta 连接上使用 = 符号,它是否与仅使用自然连接完全相同?
回答by outis
A theta joinallows for arbitrary comparison relationships (such as ≥).
甲theta连接允许任意的比较关系(如≥)。
An equijoinis a theta join using the equality operator.
一个等值连接是使用等于运算符theta连接。
A natural joinis an equijoin on attributes that have the same name in each relationship.
一个自然的加入是对那些在每个关系相同的名称属性的等值连接。
Additionally, a natural join removes the duplicate columns involved in the equality comparison so only 1 of each compared column remains; in rough relational algebraic terms:
? = πR,S-as○ ?aR=aS
此外,自然连接会删除相等比较中涉及的重复列,因此每个比较的列仅保留 1 个;用粗略的关系代数术语来说:
? = πR,S-as○ ?aR=aS
回答by Bogdan Gavril MSFT
While the answers explaining the exact differences are fine, I want to show how the relational algebra is transformed to SQL and what the actual value of the 3 concepts is.
虽然解释确切差异的答案很好,但我想展示关系代数如何转换为 SQL 以及这 3 个概念的实际价值是什么。
The key concept in your question is the idea of a join. To understand a join you need to understand a Cartesian Product (the example is based on SQL where the equivalent is called a cross join as onedaywhen points out);
您问题中的关键概念是连接的概念。要了解联接,您需要了解笛卡尔积(该示例基于 SQL,其中的等效项称为交叉联接,正如 onedaywhen 指出的那样);
This isn't very useful in practice. Consider this example.
这在实践中不是很有用。考虑这个例子。
Product(PName, Price)
====================
Laptop, 1500
Car, 20000
Airplane, 3000000
Component(PName, CName, Cost)
=============================
Laptop, CPU, 500
Laptop, hdd, 300
Laptop, case, 700
Car, wheels, 1000
The Cartesian product Product x Component will be - bellow or sql fiddle. You can see there are 12 rows = 3 x 4. Obviously, rows like "Laptop" with "wheels" have no meaning, this is why in practice the Cartesian product is rarely used.
笛卡尔积 Product x Component 将是 - bellow 或sql fiddle。您可以看到有 12 行 = 3 x 4。显然,像“笔记本电脑”和“轮子”这样的行是没有意义的,这就是为什么在实践中很少使用笛卡尔积的原因。
| PNAME | PRICE | CNAME | COST |
--------------------------------------
| Laptop | 1500 | CPU | 500 |
| Laptop | 1500 | hdd | 300 |
| Laptop | 1500 | case | 700 |
| Laptop | 1500 | wheels | 1000 |
| Car | 20000 | CPU | 500 |
| Car | 20000 | hdd | 300 |
| Car | 20000 | case | 700 |
| Car | 20000 | wheels | 1000 |
| Airplane | 3000000 | CPU | 500 |
| Airplane | 3000000 | hdd | 300 |
| Airplane | 3000000 | case | 700 |
| Airplane | 3000000 | wheels | 1000 |
JOINs are here to add more value to these products. What we really want is to "join" the product with its associated components, because each component belongs to a product. The way to do this is with a join:
JOIN 旨在为这些产品增加更多价值。我们真正想要的是将产品与其关联的组件“连接”起来,因为每个组件都属于一个产品。这样做的方法是使用连接:
Product JOIN Component ON Pname
Pname 上的产品 JOIN 组件
The associated SQL query would be like this (you can play with all the examples here)
关联的 SQL 查询将是这样的(您可以在这里使用所有示例)
SELECT *
FROM Product
JOIN Component
ON Product.Pname = Component.Pname
and the result:
结果:
| PNAME | PRICE | CNAME | COST |
----------------------------------
| Laptop | 1500 | CPU | 500 |
| Laptop | 1500 | hdd | 300 |
| Laptop | 1500 | case | 700 |
| Car | 20000 | wheels | 1000 |
Notice that the result has only 4 rows, because the Laptop has 3 components, the Car has 1 and the Airplane none. This is much more useful.
请注意,结果只有 4 行,因为 Laptop 有 3 个组件,Car 有 1 个,而 Airplane 没有。这更有用。
Getting back to your questions, all the joins you ask about are variations of the JOIN I just showed:
回到你的问题,你问的所有连接都是我刚刚展示的 JOIN 的变体:
Natural Join= the join (the ON clause) is made on all columns with the same name; it removes duplicate columns from the result, as opposed to all other joins; most DBMS (database systems created by various vendors such as Microsoft's SQL Server, Oracle's MySQL etc. ) don't even bother supporting this, it is just bad practice (or purposely chose not to implement it). Imagine that a developer comes and changes the name of the second column in Product from Price to Cost. Then all the natural joins would be done on PName AND on Cost, resulting in 0 rows since no numbers match.
自然连接= 连接(ON 子句)是在所有具有相同名称的列上进行的;它从结果中删除重复的列,而不是所有其他连接;大多数 DBMS(由各种供应商创建的数据库系统,如 Microsoft 的 SQL Server、Oracle 的 MySQL 等)甚至不支持这一点,这只是不好的做法(或故意选择不实施它)。想象一下,开发人员来了,将 Product 中第二列的名称从 Price 更改为 Cost。然后所有的自然连接都将在 PName 和 Cost 上完成,由于没有数字匹配,导致 0 行。
Theta Join= this is the general join everybody uses because it allows you to specify the condition (the ON clause in SQL). You can join on pretty much any condition you like, for example on Products that have the first 2 letters similar, or that have a different price. In practice, this is rarely the case - in 95% of the cases you will join on an equality condition, which leads us to:
Theta Join= 这是每个人都使用的通用连接,因为它允许您指定条件(SQL 中的 ON 子句)。您可以在几乎任何您喜欢的条件下加入,例如前 2 个字母相似或价格不同的产品。在实践中,这种情况很少发生 - 在 95% 的情况下,您将在平等条件下加入,这导致我们:
Equi Join= the most common one used in practice. The example above is an equi join. Databases are optimized for this type of joins! The oposite of an equi join is a non-equi join, i.e. when you join on a condition other than "=". Databases are not optimized for this! Both of them are subsets of the general theta join. The natural join is also a theta join but the condition (the theta) is implicit.
Equi Join= 实践中最常用的一种。上面的例子是一个 equi join。数据库针对这种类型的连接进行了优化!等量连接的对立面是非等量连接,即当您在“=”以外的条件下连接时。数据库没有为此优化!它们都是一般 theta 连接的子集。自然连接也是 theta 连接,但条件(theta)是隐式的。
Source of information: university + certified SQL Server developer + recently completed the MOO "Introduction to databases" from Stanford so I dare say I have relational algebra fresh in mind.
信息来源:大学 + 经过认证的 SQL Server 开发人员 + 最近完成了斯坦福大学的 MOO“数据库简介”,所以我敢说我对关系代数有了新的认识。
回答by onedaywhen
@outis's answer is good: concise and correct as regards relations.
@outis 的回答很好:关于关系的简洁和正确。
However, the situation is slightly more complicated as regards SQL.
但是,SQL 的情况稍微复杂一些。
Consider the usual suppliers and parts databasebut implemented in SQL:
考虑通常的供应商和零件数据库,但在 SQL 中实现:
SELECT * FROM S NATURAL JOIN SP;
would return a resultset** with columns
将返回一个带有列的结果集**
SNO, SNAME, STATUS, CITY, PNO, QTY
SNO, SNAME, STATUS, CITY, PNO, QTY
The join is performed on the column with the same name in both tables, SNO
. Note that the resultset has six columns and only contains one column for SNO
.
对两个表中同名的列执行连接,SNO
。请注意,结果集有 6 列,并且仅包含 1 列SNO
。
Now consider a theta eqijoin, where the column names for the join must be explicitly specified (plus range variables S
and SP
are required):
现在考虑 theta eqijoin,其中必须明确指定连接的列名(加上范围变量S
并且SP
是必需的):
SELECT * FROM S JOIN SP ON S.SNO = SP.SNO;
The resultset will have seven columns, including two columns for SNO
. The names of the resultset are what the SQL Standard refers to as "implementation dependent" but could look like this:
结果集将有七列,其中两列用于SNO
. 结果集的名称是 SQL 标准所指的“依赖于实现”,但可能如下所示:
SNO, SNAME, STATUS, CITY, SNO, PNO, QTY
SNO, SNAME, STATUS, CITY, SNO, PNO, QTY
or perhaps this
或者这个
S.SNO, SNAME, STATUS, CITY, SP.SNO, PNO, QTY
S.SNO, SNAME, STATUS, CITY, SP.SNO, PNO, QTY
In other words, NATURAL JOIN
in SQL can be considered to remove columns with duplicated names from the resultset (but alas will not remove duplicate rows - you must remember to change SELECT
to SELECT DISTINCT
yourself).
换句话说,NATURAL JOIN
在 SQL 中可以考虑从结果集中删除具有重复名称的列(但唉不会删除重复的行 - 你必须记住更改SELECT
为SELECT DISTINCT
自己)。
** I don't quite know what the result of SELECT * FROM table_expression;
is. I know it is not a relation because, among other reasons, it can have columns with duplicate names or a column with no name. I know it is not a set because, among other reasons, the column order is significant. It's not even a SQL table or SQL table expression. I call it a resultset.
**我不太清楚结果SELECT * FROM table_expression;
是什么。我知道这不是一个关系,因为除其他原因外,它可能包含具有重复名称的列或没有名称的列。我知道它不是一个集合,因为除其他原因外,列顺序很重要。它甚至不是 SQL 表或 SQL 表表达式。我称之为结果集。
回答by heisenberg
Natural is a subset of Equi which is a subset of Theta.
Natural 是 Equi 的一个子集,Equi 是 Theta 的一个子集。
If I use the = sign on a theta join is it exactly the same as just using a natural join???
如果我在 theta 连接上使用 = 符号,它是否与仅使用自然连接完全相同???
Not necessarily, but it would be an Equi. Natural means you are matching on all similarly named columns, Equi just means you are using '=' exclusively (and not 'less than', like, etc)
不一定,但它会是一个 Equi。Natural 意味着您在所有类似名称的列上进行匹配,Equi 仅意味着您仅使用 '='(而不是 'less than'、like 等)
This is pure academia though, you could work with relational databases for years and never hear anyone use these terms.
不过,这是纯粹的学术界,您可以使用关系数据库多年,但从未听过任何人使用这些术语。
回答by Palak Jain
Theta Join:When you make a query for join using any operator,(e.g., =, <, >, >= etc.), then that join query comes under Theta join.
Equi Join:When you make a query for join using equality operator only, then that join query comes under Equi join.
Theta Join:当您使用任何运算符(例如,=、<、>、>= 等)进行连接查询时,该连接查询将位于 Theta join 下。
Equi Join:当您仅使用相等运算符进行联接查询时,该联接查询将归入 Equi join 之下。
Example:
例子:
> SELECT * FROM Emp JOIN Dept ON Emp.DeptID = Dept.DeptID; > SELECT * FROM Emp INNER JOIN Dept USING(DeptID)
This will show: _________________________________________________ | Emp.Name | Emp.DeptID | Dept.Name | Dept.DeptID | | | | | |
Note: Equi join is also a theta join!
Natural Join:a type of Equi Join which occurs implicitly by comparing all the same names columns in both tables.
注意:Equi join 也是 theta join!
Natural Join:一种 Equi Join,它通过比较两个表中所有相同名称的列来隐式发生。
Note: here, the join result has only one column for each pair of same named columns.
注意:这里,每对同名列的连接结果只有一列。
Example
例子
SELECT * FROM Emp NATURAL JOIN Dept
This will show: _______________________________ | DeptID | Emp.Name | Dept.Name | | | | |
回答by himani
Cartesian product of two tables gives all the possible combinations of tuples like the example in mathematics the cross product of two sets . since many a times there are some junk values which occupy unnecessary space in the memory too so here joins comes to rescue which give the combination of only those attribute values which are required and are meaningful.
两个表的笛卡尔积给出了所有可能的元组组合,就像数学中的例子一样,两个集合的叉积。因为很多时候有一些垃圾值也在内存中占据了不必要的空间,所以这里加入拯救,它只给出那些必需且有意义的属性值的组合。
inner join gives the repeated field in the table twice whereas natural join here solves the problem by just filtering the repeated columns and displaying it only once.else, both works the same. natural join is more efficient since it preserves the memory .Also , redundancies are removed in natural join .
内连接给表中的重复字段两次,而这里的自然连接仅通过过滤重复列并仅显示一次来解决问题。其他,两者的工作原理相同。自然连接更有效,因为它保留了内存。此外,自然连接中删除了冗余。
equi join of two tables are such that they display only those tuples which matches the value in other table . for example : let new1 and new2 be two tables . if sql query select * from new1 join new2 on new1.id = new.id (id is the same column in two tables) then start from new2 table and join which matches the id in second table . besides , non equi join do not have equality operator they have <,>,and between operator .
两个表的 equi join 是这样的,它们只显示那些与其他表中的值匹配的元组。例如:让 new1 和 new2 成为两个表。如果 sql 查询 select * from new1 join new2 on new1.id = new.id (id 是两个表中的同一列) 然后从 new2 表开始并加入与第二个表中的 id 匹配的 id 。此外,non equi join 没有相等运算符,它们具有 <,> 和 between 运算符。
theta join consists of all the comparison operator including equality and others < , > comparison operator. when it uses equality(=) operator it is known as equi join .
theta join 包含所有比较运算符,包括相等和其他 < , > 比较运算符。当它使用 equal(=) 运算符时,它被称为 equi join 。
回答by rashedcs
Natural Join: Natural join can be possible when there is at least one common attribute in two relations.
自然连接:当两个关系中至少有一个共同属性时,自然连接是可能的。
Theta Join: Theta join can be possible when two act on particular condition.
Theta Join:当两个人在特定条件下行动时,Theta join 是可能的。
Equi Join: Equi can be possible when two act on equity condition. It is one type of theta join.
Equi Join:当两个人在平等条件下行动时,Equi 是可能的。它是一种θ连接。