SQL - 从左连接中删除重复项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14615032/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 13:22:50  来源:igfitidea点击:

SQL - remove duplicates from left join

sqlsql-serverjoin

提问by Rick Momsen

I'm creating a joined view of two tables, but am getting unwanted duplicates from table2.
For example: table1 has 9000 records and I need the resulting view to contain exactly the same; table2 may have multiple records with the same FKID but I only want to return one record (random chosen is ok with my customer). I have the following code that works correctly, but performance is slower than desired (over 14 seconds).

我正在创建两个表的连接视图,但是从 table2 中得到了不需要的重复项。
例如:table1 有 9000 条记录,我需要结果视图包含完全相同的;table2 可能有多个具有相同 FKID 的记录,但我只想返回一个记录(随机选择对我的客户来说是可以的)。我有以下代码可以正常工作,但性能比预期慢(超过 14 秒)。

SELECT     
    OBJECTID
    , PKID
    ,(SELECT TOP (1) SUBDIVISIO
        FROM dbo.table2 AS t2
        WHERE (t1.PKID = t2.FKID)) AS ProjectName
    ,(SELECT TOP (1) ASBUILT1
        FROM dbo.table2 AS t2
        WHERE (t1.PKID = t2.FKID)) AS Asbuilt
FROM dbo.table1 AS t1

Is there a way to do something similar with joins to speed up performance?
I'm using SQL Server 2008 R2.
I got close with the following code (~.5 seconds), but 'Distinct' only filters out records when all columns are duplicate (rather than just the FKID).

有没有办法对连接做类似的事情来提高性能?
我正在使用 SQL Server 2008 R2。
我接近以下代码(约 .5 秒),但“Distinct”仅在所有列都重复(而不仅仅是 FKID)时过滤掉记录。

SELECT
    t1.OBJECTID
    ,t1.PKID
    ,t2.ProjectName
    ,t2.Asbuilt
FROM dbo.table1 AS t1
    LEFT JOIN (SELECT
        DISTINCT FKID
        ,ProjectName
        ,Asbuilt
        FROM dbo.table2) t2
    ON t1.PKID = t2.FKID

table examples

表格示例

table1          table2

OID, PKID       FKID, ProjectName, Asbuilt
1, id1          id1, P1, AB1
2, id2          id1, P5, AB5
3, id4          id2, P10, AB2
5, id5          id5, P4, AB4

In the above example returned records should be id5/P4/AB4, id2/P10/AB2, and (id1/P1/AB1 OR id1/P5/AB5)

在上面的例子中,返回的记录应该是 id5/P4/AB4、id2/P10/AB2 和 (id1/P1/AB1 OR id1/P5/AB5)

My search came up with similar questions, but none that resolved my problem. link, link
Thanks in advance for your help. This is my first post so let me know if I've broken any rules.

我的搜索提出了类似的问题,但没有解决我的问题。 链接,链接
在此先感谢您的帮助。这是我的第一篇文章,所以如果我违反了任何规则,请告诉我。

回答by RBarryYoung

This will give the results you requested and should have the best performance.

这将提供您要求的结果,并且应该具有最佳性能。

SELECT     
    OBJECTID
    , PKID
    , t2.SUBDIVISIO,
    , t2.ASBUILT1

FROM        dbo.table1 AS t1
OUTER APPLY (
    SELECT  TOP 1 *
    FROM    dbo.table2 AS t2
    WHERE   t1.PKID = t2.FKID
    ) AS t2

回答by Gordon Linoff

Your original query is producing arbitrary values for the two columns (the use of topwith no order by). You can get the same effect with this:

您的原始查询为两列生成任意值(使用topwith no order by)。您可以通过以下方式获得相同的效果:

SELECT t1.OBJECTID, t1.PKID, t2.ProjectName, t2.Asbuilt
FROM dbo.table1 t1 LEFT JOIN
     (SELECT FKID, min(ProjectName) as ProjectName, MIN(asBuilt) as AsBuilt
      FROM dbo.table2
      group by fkid
     ) t2
    ON t1.PKID = t2.FKID

This version replaces the distinctwith a group by.

这个版本替换distinctgroup by

To get a truly random row in SQL Server (which your syntax suggests you are using), try this:

要在 SQL Server 中获得真正随机的行(您的语法建议您正在使用),请尝试以下操作:

SELECT t1.OBJECTID, t1.PKID, t2.ProjectName, t2.Asbuilt
FROM dbo.table1 t1 LEFT JOIN
     (SELECT FKID, ProjectName, AsBuilt,
             ROW_NUMBER() over (PARTITION by fkid order by newid()) as seqnum
      FROM dbo.table2
     ) t2
    ON t1.PKID = t2.FKID and t2.seqnum = 1

This assumes version 2005 or greater.

这假定版本为 2005 或更高版本。

回答by veljasije

If you want described result, you need to use INNER JOINand following query will satisfy your need:

如果您想要描述的结果,您需要使用INNER JOIN以下查询将满足您的需求:

SELECT
  t1.OID,
  t1.PKID,
  MAX(t2.ProjectName) AS ProjectName,
  MAX(t2.Asbuilt) AS Asbuilt
FROM table1 t1
JOIN table2 t2 ON t1.PKID = t2.FKID
GROUP BY
  t1.OID,
  t1.PKID

If you want to see all rows from left table (table1) whether it has pair in right table or not, then use LEFT JOINand same query will gave you desired result.

如果您想查看左表 ( table1) 中的所有行,无论它是否在右表中成对,那么使用LEFT JOIN相同的查询会给您所需的结果。

EDITED

已编辑

This construction has good performance, and you dont need to use subqueries.

这种构造性能好,不需要使用子查询。