SQL:需要删除包含多个连接的查询中的重复行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3578006/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 07:20:08  来源:igfitidea点击:

SQL: Need to remove duplicate rows in query containing multiple joins

sql

提问by ShadowXOR

Note that I'm a complete SQL noob and in the process of learning. Based on Google searches (including searching here) I've tried using SELECT DISTINCT and GROUP BY but neither works, likely due to all of my joins (if anyone knows why they won't work exactly, that would be helpful to learn).

请注意,我是一个完整的 SQL noob 并且正在学习中。基于谷歌搜索(包括在此处搜索),我尝试使用 SELECT DISTINCT 和 GROUP BY 但都不起作用,可能是由于我的所有连接(如果有人知道为什么它们不能完全工作,那将有助于学习)。

I need data from a variety of tables and below is the only way I know to do it (I just know the basics). The query below works fine but shows duplicates. I need to know how to remove those. The only hint I have right now is perhaps a nested SELECT query but based on research I'm not sure how to implement them. Any help at all would be great, thanks!

我需要来自各种表格的数据,下面是我知道的唯一方法(我只知道基础知识)。下面的查询工作正常,但显示重复。我需要知道如何删除这些。我现在唯一的提示可能是嵌套的 SELECT 查询,但根据研究,我不确定如何实现它们。任何帮助都会很棒,谢谢!

USE SQL_Contest
go
SELECT
    CLT.Description AS ClockType,
    CLK.SerialNumber AS JobClockSerial,
    SIT.SiteNumber AS JobID,
    SIT.[Name] AS JobsiteName,
    SIT.Status AS SiteActivityStatus,
    DHA.IssuedDate AS DHAIssuedDate, -- Date the clock was assigned to THAT jobsite
    CLK.CreatedDate AS CLKCreatedDate, -- Date clock first was assigned to ANY jobsite
    SES.ClockVoltage
FROM tb_Clock CLK
INNER JOIN tb_ClockType CLT
ON CLK.TypeID = CLT.ClockTypeID
INNER JOIN tb_DeviceHolderActivity DHA
ON CLK.ClockGUID = DHA.DeviceGUID
INNER JOIN tb_Site SIT
ON SIT.SiteGUID = DHA.HolderGUID
LEFT JOIN tb_Session SES
ON SES.ClockSerialNumber = CLK.SerialNumber
WHERE DHA.ReturnedDate IS NULL
ORDER BY SIT.[Name] ASC

EDIT: I will be reviewing these answers shortly, thank you very much. I'm posting the additional duplicate info per Rob's request:

编辑:我将很快这些答案,非常感谢。我根据 Rob 的要求发布了额外的重复信息:

Everything displays fine until I add:

一切都显示正常,直到我添加:

LEFT JOIN tb_Session SES
ON SES.ClockSerialNumber = CLK.SerialNumber

Which I need. That's when a duplicate appears:

我需要的。那是出现重复的时候:

JobClock 2,500248E4,08-107,Brentwood Job,1,2007-05-04 13:36:54.000,2007-05-04 13:47:55.407,3049    
JobClock 2,500248E4,08-107,Brentwood Job,1,2007-05-04 13:36:54.000,2007-05-04 13:47:55.407,3049

I want that info to only display once. Essentially this query is to determine all active jobsites that have a clock assigned to them, and that job only has one clock assigned to it, and it's only one jobsite, but it's appearing twice.

我希望该信息只显示一次。本质上,此查询是确定所有分配有时钟的活动工地,并且该作业仅分配了一个时钟,并且它只有一个工地,但它出现了两次。

EDIT 2: Based on the help you guys provided I was able to determine they actually are NOT duplicates, and each session is independent, that is the only one that happened to have two sessions. So now I'm going to try to figure out how to only pull in information from the latest session.

编辑 2:根据你们提供的帮助,我能够确定它们实际上不是重复的,并且每个会话都是独立的,这是唯一一个碰巧有两个会话的会话。所以现在我将尝试弄清楚如何只从最新会话中提取信息。

采纳答案by Rob

If everything "works fine" until you add:

如果一切“正常”,直到您添加:

LEFT JOIN tb_Session SES
ON SES.ClockSerialNumber = CLK.SerialNumber

Then there must be more than one record in tb_Sessionfor each CLK.SerialNumber.

那么tb_Session每个 CLK.SerialNumber 中必须有多个记录。

Run the following query:

运行以下查询:

SELECT  *
FROM    tb_Session SES
WHERE   ClockSerialNumber = '08-107'

There should be two records returned. You need to decide how to handle this (i.e. Which record do you want to use?), unless both rows from tb_Sessioncontain identical data, in which case, should they?

应该返回两条记录。您需要决定如何处理这个问题(即您想使用哪个记录?),除非来自的两行都tb_Session包含相同的数据,在这种情况下,应该这样做吗?

You could always change your query to:

您可以随时将查询更改为:

SELECT
    CLT.Description AS ClockType,
    CLK.SerialNumber AS JobClockSerial,
    SIT.SiteNumber AS JobID,
    SIT.[Name] AS JobsiteName,
    SIT.Status AS SiteActivityStatus,
    DHA.IssuedDate AS DHAIssuedDate, -- Date the clock was assigned to THAT jobsite
    CLK.CreatedDate AS CLKCreatedDate, -- Date clock first was assigned to ANY jobsite
    SES.ClockVoltage
FROM tb_Clock CLK
INNER JOIN tb_ClockType CLT
ON CLK.TypeID = CLT.ClockTypeID
INNER JOIN tb_DeviceHolderActivity DHA
ON CLK.ClockGUID = DHA.DeviceGUID
INNER JOIN tb_Site SIT
ON SIT.SiteGUID = DHA.HolderGUID
LEFT JOIN 
(
    SELECT DISTINCT ClockSerialNumber, ClockVoltage
    FROM tb_Session 
) SES
ON SES.ClockSerialNumber = CLK.SerialNumber
WHERE DHA.ReturnedDate IS NULL
ORDER BY SIT.[Name] ASC

As that should ensure that SESonly contains one record for each unique combination of ClockSerialNumberand ClockVoltage

作为应确保SES只包含一个记录的每个唯一组合ClockSerialNumberClockVoltage

回答by Rob

Take this example dataset:

以这个示例数据集为例:

Ingredient

成分

IngredientId IngredientName
============ =========
1            Apple
2            Orange
3            Pear
4            Tomato

Recipe

食谱

RecipeId RecipeName
======== ==========
1        Apple Turnover
2        Apple Pie
3        Poached Pears

Recipe_Ingredient

Recipe_Ingredient

RecipeId IngredientId Quantity
======== ============ ========
1        1            0.25
1        1            1.00
2        1            2.00
3        3            1.00

Note:Why the Apple Turnover has two lots of apple as ingredients, is neither here nor there, it just does.

注意:为什么 Apple Turnover 有两批苹果作为成分,既不存在也不存在,它只是存在。

The following query will return two rows for the "Apple Turnover" recipe, one row for the "Apple Pie" recipe and one row for the "Poached Pears" recipe, because there are twoentries in the Recipe_Ingredienttable for IngredientId 1. That's just what happens with a join..

以下查询将返回“Apple Turnover”配方的两行,“Apple Pie”配方的一行和“Poached Pears”配方的一行,因为IngredientId 1的表中有两个条目Recipe_Ingredient。这就是加入时发生..

SELECT  I.IngredientName,
        R.RecipeName
FROM    Ingredient I
JOIN    Recipe_Ingredient RI
        ON I.IngredientId = RI.IngredientId
JOIN    Recipe R
        ON RI.recipeId = R.RecipeId

You could get this to return only one row by changing it to:

您可以通过将其更改为以下内容来使其仅返回一行:

SELECT  I.IngredientName,
        R.RecipeName
FROM    Ingredient I
JOIN    Recipe_Ingredient RI
        ON I.IngredientId = RI.IngredientId
JOIN    Recipe R
        ON RI.recipeId = R.RecipeId
GROUP BY I.IngredientName, R.RecipeName

Without more specifics regarding your data, it's hard to apply this to your specific scenario, but the walkthrough may help you understand where the "duplicates" are coming from as someone unfamiliar with SQL

没有关于您的数据的更多细节,很难将其应用于您的特定场景,但演练可能会帮助您了解“重复”的来源,因为不熟悉 SQL

回答by Paul Keister

The joins are not your problem. From your comments I will infer that what you are calling "duplicates" are not actual duplicates. If all columns values for 2 "duplicates" returned from the query matched, then either SELECT DISTINCT or GROUP BY would definitely eliminate them. So you should be able to find a solution by looking at your column definitions.

连接不是你的问题。从您的评论中,我会推断出您所说的“重复”并不是实际的重复。如果从查询返回的 2 个“重复项”的所有列值都匹配,那么 SELECT DISTINCT 或 GROUP BY 肯定会消除它们。因此,您应该能够通过查看列定义来找到解决方案。

My best guess is that you're getting duplicates of for the same date which aren't really duplicates because the time component of the date doesn't match. To eliminate this problem, you can truncate the date fields to the date only using this technique:

我最好的猜测是,您得到了同一日期的重复项,这些重复项实际上并不是重复项,因为日期的时间部分不匹配。要消除此问题,您可以仅使用此技术将日期字段截断为日期:

    DATEADD(DAY, DATEDIFF(DAY, 0, DHA.IssuedDate), 0) AS DHAIssuedDate,
    DATEADD(DAY, DATEDIFF(DAY, 0, CLK.CreatedDate), 0) AS CLKCreatedDate,   

If that doesn't work you might want to take a look at JobClockSerial: does this column belong in the query results?

如果这不起作用,您可能想看看JobClockSerial:此列是否属于查询结果?