如何避免跨三个连接表的 sql 查询中的重复项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13943832/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to avoid duplicates in sql query across three joined tables
提问by Chain
I'm getting duplicates when I do two LEFT JOINs to get to the "event_name" in my example below. I get 112 cases with it set up this way. However, if I get rid of the 2 LEFT JOIN lines and run the query, I get the proper 100 records without duplicates. I tried DISTINCT with the code below, but I still get 112 with duplicates.
在下面的示例中,当我执行两个 LEFT JOIN 以到达“event_name”时,我得到了重复项。我以这种方式设置了 112 个案例。但是,如果我去掉 2 LEFT JOIN 行并运行查询,我会得到正确的 100 条记录,没有重复。我用下面的代码尝试了 DISTINCT,但我仍然得到 112 重复。
SELECT "cases"."id", "cases"."date", "cases"."name", "event"."event_name"
FROM "cases"
LEFT JOIN "middle_table" ON "cases"."serial" = "middle_table"."m_serial"
LEFT JOIN "event" ON "middle_table"."e_serial" = "event"."ev_serial"
WHERE "cases"."date" BETWEEN '2012-12-11' AND '2012-12-13'
How can I specify that I only want the exact 100 cases from "cases", and that I don't want anything from the tables in the joins to produce any more rows?
如何指定我只想要“案例”中的 100 个案例,并且我不希望连接中的表中的任何内容产生更多行?
Thanks!
谢谢!
采纳答案by AndreKR
You need to extend your ON clauses to include a condition so that for each entry in cases
there is only one entry in middle_table
that matches the condition and that for each entry in middle_table
there is only one entry in event
:
您需要扩展 ON 子句以包含条件,以便对于 中的每个条目,cases
只有一个条目middle_table
与条件匹配,并且对于 中的每个条目,middle_table
只有一个条目event
:
LEFT JOIN middle_table ON cases.serial = middle_table.m_serial AND some_condition
You can of course use DISTINCT. If that doesn't work it means that your results are all different in the fields cases.id
, cases.date
, cases.name
and event.event_name
. Examine the results and decide which of the entries you want to throw away and include that condition in your ON clause.
您当然可以使用 DISTINCT。如果不工作就意味着你的结果是在所有领域的不同cases.id
,cases.date
,cases.name
和event.event_name
。检查结果并决定您要丢弃哪些条目并将该条件包含在您的 ON 子句中。
回答by JohnLBevan
The issue is you have multiple matches in the tables you're left joining with. Effectively your code says:
问题是您要加入的表中有多个匹配项。实际上,您的代码说:
select *
from parent
left outer join child on parent.id = child.parentId
If a parent has two children, you get both; so the parent appears twice.
如果父母有两个孩子,你得到两个;所以父母出现两次。
If you want to only get the parent once you need to compromise; you can't have both children. Either perform an aggregate function on columns from the child table and do a group by on columns from the parent table, or use rownumber() over partition by (list,of,parent,columns order by list,of,child,columns) r
in an inner statement and where r=1
in an outer statement, such as below:
如果您只想在需要妥协时获得父母;你不能有两个孩子。要么对子表中的列执行聚合函数并对父表中的列执行分组依据,要么rownumber() over partition by (list,of,parent,columns order by list,of,child,columns) r
在内部语句和where r=1
外部语句中使用,如下所示:
select p.id, p.name, max(c.id), max(c.name) --nb: child id and name may come from different records
from parent p
left outer join child c on parent.id = child.parentId
group by p.id, p.name
or
或者
select *
from
(
select p.id, p.name, c.id, c.name
, rownumber() over (partition by p.id order by c.id desc) r
from parent p
left outer join child c on parent.id = child.parentId
) x
where x.r = 1
UPDATE
更新
As mentioned in the comments, if the child data is exactly the same you can do this:
如评论中所述,如果子数据完全相同,您可以这样做:
select p.id, p.name, c.name
from parent p
left outer join
(
select distinct c.parentId, c.name
from child
) c on parent.id = child.parentId
or (if a few fields are different but you don't care which you get)
或者(如果有几个字段不同,但你不在乎你得到哪个)
select p.id, p.name, c.id, c.name
from parent p
left outer join
(
select max(c.id) id, c.parentId, c.name
from child
group by c.parentId, c.name
) c on parent.id = child.parentId
回答by Michael Durrant
The duplicates are the result of having multiple fields for "middle_table" and "event" for "cases". You can limit the selections to the values that are unique by using the "GROUP BY" keyword (which is usually used for collating functions, such as COUNT and SUM), as follows:
重复项是“cases”的“middle_table”和“event”有多个字段的结果。您可以使用“GROUP BY”关键字(通常用于整理功能,例如 COUNT 和 SUM)将选择限制为唯一的值,如下所示:
SELECT "cases"."id", "cases"."date", "cases"."name", "event"."event_name"
FROM "cases"
LEFT JOIN "middle_table" ON "cases"."serial" = "middle_table"."m_serial"
LEFT JOIN "event" ON "middle_table"."e_serial" = "event"."ev_serial"
GROUP BY "cases"."id", "cases"."date", "cases"."name", "event"."event_name"
WHERE "cases"."date" BETWEEN '2012-12-11' AND '2012-12-13'