在 SQL 中使用 DISTINCT 内连接
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/161404/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Using DISTINCT inner join in SQL
提问by Mats Fredriksson
I have three tables, A, B, C, where A is many to one B, and B is many to one C. I'd like a list of all C's in A.
我有三个表,A、B、C,其中 A 是多对一 B,B 是多对一 C。我想要 A 中所有 C 的列表。
My tables are something like this: A[id, valueA, lookupB], B[id, valueB, lookupC], C[id, valueC]. I've written a query with two nested SELECTs, but I'm wondering if it's possible to do INNER JOIN with DISTINCT somehow.
我的表是这样的:A[id,valueA,lookupB],B[id,valueB,lookupC],C[id,valueC]。我已经用两个嵌套的 SELECT 编写了一个查询,但我想知道是否可以以某种方式使用 DISTINCT 进行 INNER JOIN。
SELECT valueC
FROM C
INNER JOIN
(
SELECT DISTINCT lookupC
FROM B INNER JOIN
(
SELECT DISTINCT lookupB
FROM A
)
A2 ON B.id = A2.lookupB
)
B2 ON C.id = B2.lookupC
EDIT: The tables are fairly large, A is 500k rows, B is 10k rows and C is 100 rows, so there are a lot of uneccesary info if I do a basic inner join and use DISTINCT in the end, like this:
编辑:表相当大,A 是 500k 行,B 是 10k 行,C 是 100 行,所以如果我做一个基本的内部连接并最后使用 DISTINCT,会有很多不必要的信息,像这样:
SELECT DISTINCT valueC
FROM
C INNER JOIN B on C.id = B.lookupB
INNER JOIN A on B.id = A.lookupB
This is very, very slow (magnitudes times slower than the nested SELECT I do above.
这非常非常慢(幅度比我上面做的嵌套 SELECT 慢几倍。
采纳答案by Darrel Miller
I did a test on MS SQL 2005 using the following tables: A 400K rows, B 26K rows and C 450 rows.
我使用以下表格对 MS SQL 2005 进行了测试:A 400K 行、B 26K 行和 C 450 行。
The estimated query plan indicated that the basic inner join would be 3 times slower than the nested sub-queries, however when actually running the query, the basic inner join was twice as fast as the nested queries, The basic inner join took 297ms on very minimal server hardware.
估计的查询计划表明基本内连接比嵌套子查询慢 3 倍,但是在实际运行查询时,基本内连接是嵌套查询的两倍,基本内连接在非常最少的服务器硬件。
What database are you using, and what times are you seeing? I'm thinking if you are seeing poor performance then it is probably an index problem.
您使用的是什么数据库,您看到的时间是什么时候?我在想,如果您看到性能不佳,则可能是索引问题。
回答by Jonathan Lonowski
I believe your 1:mrelationships should already implicitly create DISTINCT JOINs.
我相信您的1:m关系应该已经隐式地创建了 DISTINCT JOIN。
But, if you're goal is just C's in each A, it might be easier to just use DISTINCT on the outer-most query.
但是,如果您的目标只是每个 A 中的 C,那么在最外面的查询中使用 DISTINCT 可能会更容易。
SELECT DISTINCT a.valueA, c.valueC
FROM C
INNER JOIN B ON B.lookupC = C.id
INNER JOIN A ON A.lookupB = B.id
ORDER BY a.valueA, c.valueC
回答by VVS
SELECT DISTINCT C.valueC
FROM C
LEFT JOIN B ON C.id = B.lookupC
LEFT JOIN A ON B.id = A.lookupB
WHERE C.id IS NOT NULL
I don't see a good reason why you want to limit the result sets of A and B because what you want to have is a list of all C's that are referenced by A. I did a distinct on C.valueC because i guessed you wanted a unique list of C's.
我看不出为什么要限制 A 和 B 的结果集的充分理由,因为您想要的是 A 引用的所有 C 的列表。我对 C.valueC 做了一个不同的处理,因为我猜到了你想要一个唯一的 C 列表。
EDIT: I agree with your argument. Even if your solution looks a bit nested it seems to be the best and fastest way to use your knowledge of the data and reduce the result sets.
编辑:我同意你的论点。即使您的解决方案看起来有点嵌套,但它似乎是使用您的数据知识并减少结果集的最佳和最快的方法。
There is no distinct join construct you could use so just stay with what you already have :)
没有您可以使用的独特连接构造,因此只需使用您已有的:)
回答by kristian
Is this what you mean?
你是这个意思吗?
SELECT DISTINCT C.valueC
FROM
C
INNER JOIN B ON C.id = B.lookupC
INNER JOIN A ON B.id = A.lookupB