SQL - 从左连接中删除重复项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14615032/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL - remove duplicates from left join
提问by Rick Momsen
I'm creating a joined view of two tables, but am getting unwanted duplicates from table2.
For example: table1 has 9000 records and I need the resulting view to contain exactly the same; table2 may have multiple records with the same FKID but I only want to return one record (random chosen is ok with my customer). I have the following code that works correctly, but performance is slower than desired (over 14 seconds).
我正在创建两个表的连接视图,但是从 table2 中得到了不需要的重复项。
例如:table1 有 9000 条记录,我需要结果视图包含完全相同的;table2 可能有多个具有相同 FKID 的记录,但我只想返回一个记录(随机选择对我的客户来说是可以的)。我有以下代码可以正常工作,但性能比预期慢(超过 14 秒)。
SELECT
OBJECTID
, PKID
,(SELECT TOP (1) SUBDIVISIO
FROM dbo.table2 AS t2
WHERE (t1.PKID = t2.FKID)) AS ProjectName
,(SELECT TOP (1) ASBUILT1
FROM dbo.table2 AS t2
WHERE (t1.PKID = t2.FKID)) AS Asbuilt
FROM dbo.table1 AS t1
Is there a way to do something similar with joins to speed up performance?
I'm using SQL Server 2008 R2.
I got close with the following code (~.5 seconds), but 'Distinct' only filters out records when all columns are duplicate (rather than just the FKID).
有没有办法对连接做类似的事情来提高性能?
我正在使用 SQL Server 2008 R2。
我接近以下代码(约 .5 秒),但“Distinct”仅在所有列都重复(而不仅仅是 FKID)时过滤掉记录。
SELECT
t1.OBJECTID
,t1.PKID
,t2.ProjectName
,t2.Asbuilt
FROM dbo.table1 AS t1
LEFT JOIN (SELECT
DISTINCT FKID
,ProjectName
,Asbuilt
FROM dbo.table2) t2
ON t1.PKID = t2.FKID
table examples
表格示例
table1 table2
OID, PKID FKID, ProjectName, Asbuilt
1, id1 id1, P1, AB1
2, id2 id1, P5, AB5
3, id4 id2, P10, AB2
5, id5 id5, P4, AB4
In the above example returned records should be id5/P4/AB4, id2/P10/AB2, and (id1/P1/AB1 OR id1/P5/AB5)
在上面的例子中,返回的记录应该是 id5/P4/AB4、id2/P10/AB2 和 (id1/P1/AB1 OR id1/P5/AB5)
My search came up with similar questions, but none that resolved my problem. link, link
Thanks in advance for your help. This is my first post so let me know if I've broken any rules.
我的搜索提出了类似的问题,但没有解决我的问题。 链接,链接
在此先感谢您的帮助。这是我的第一篇文章,所以如果我违反了任何规则,请告诉我。
回答by RBarryYoung
This will give the results you requested and should have the best performance.
这将提供您要求的结果,并且应该具有最佳性能。
SELECT
OBJECTID
, PKID
, t2.SUBDIVISIO,
, t2.ASBUILT1
FROM dbo.table1 AS t1
OUTER APPLY (
SELECT TOP 1 *
FROM dbo.table2 AS t2
WHERE t1.PKID = t2.FKID
) AS t2
回答by Gordon Linoff
Your original query is producing arbitrary values for the two columns (the use of top
with no order by
). You can get the same effect with this:
您的原始查询为两列生成任意值(使用top
with no order by
)。您可以通过以下方式获得相同的效果:
SELECT t1.OBJECTID, t1.PKID, t2.ProjectName, t2.Asbuilt
FROM dbo.table1 t1 LEFT JOIN
(SELECT FKID, min(ProjectName) as ProjectName, MIN(asBuilt) as AsBuilt
FROM dbo.table2
group by fkid
) t2
ON t1.PKID = t2.FKID
This version replaces the distinct
with a group by
.
这个版本替换distinct
了group by
。
To get a truly random row in SQL Server (which your syntax suggests you are using), try this:
要在 SQL Server 中获得真正随机的行(您的语法建议您正在使用),请尝试以下操作:
SELECT t1.OBJECTID, t1.PKID, t2.ProjectName, t2.Asbuilt
FROM dbo.table1 t1 LEFT JOIN
(SELECT FKID, ProjectName, AsBuilt,
ROW_NUMBER() over (PARTITION by fkid order by newid()) as seqnum
FROM dbo.table2
) t2
ON t1.PKID = t2.FKID and t2.seqnum = 1
This assumes version 2005 or greater.
这假定版本为 2005 或更高版本。
回答by veljasije
If you want described result, you need to use INNER JOIN
and following query will satisfy your need:
如果您想要描述的结果,您需要使用INNER JOIN
以下查询将满足您的需求:
SELECT
t1.OID,
t1.PKID,
MAX(t2.ProjectName) AS ProjectName,
MAX(t2.Asbuilt) AS Asbuilt
FROM table1 t1
JOIN table2 t2 ON t1.PKID = t2.FKID
GROUP BY
t1.OID,
t1.PKID
If you want to see all rows from left table (table1) whether it has pair in right table or not, then use LEFT JOIN
and same query will gave you desired result.
如果您想查看左表 ( table1) 中的所有行,无论它是否在右表中成对,那么使用LEFT JOIN
相同的查询会给您所需的结果。
EDITED
已编辑
This construction has good performance, and you dont need to use subqueries.
这种构造性能好,不需要使用子查询。