从 SQL 查询中删除重复项(不仅仅是“使用不同的”)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4891676/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 09:11:27  来源:igfitidea点击:

Removing duplicates from a SQL query (not just "use distinct")

sqlduplicatesdistinct

提问by Dave

It's probably simple, here is my query:

这可能很简单,这是我的查询:

SELECT DISTINCT U.NAME, P.PIC_ID
FROM USERS U, PICTURES P, POSTINGS P1
WHERE U.EMAIL_ID = P1.EMAIL_ID AND P1.PIC_ID = P.PIC_ID AND P.CAPTION LIKE '%car%';

but this will only remove duplicates where a row has both the same u.name and p.pic_id. I want it so if there is any duplicates of the names, it just leaves out the other rows. It's a weird query, but in general, how can I apply the distinct to a single column of the SELECT clause?

但这只会删除行具有相同 u.name 和 p.pic_id 的重复项。我想要它,所以如果名称有任何重复,它只会忽略其他行。这是一个奇怪的查询,但总的来说,我如何将不同的应用于 SELECT 子句的单个列?

回答by Joe Stefanelli

Arbitrarily choosing to keep the minimum PIC_ID. Also, avoid using the implicit join syntax.

任意选择保持最小PIC_ID。此外,避免使用隐式连接语法。

SELECT U.NAME, MIN(P.PIC_ID)
    FROM USERS U
        INNER JOIN POSTINGS P1
            ON U.EMAIL_ID = P1.EMAIL_ID
        INNER JOIN PICTURES P
            ON P1.PIC_ID = P.PIC_ID
    WHERE P.CAPTION LIKE '%car%'
    GROUP BY U.NAME;

回答by KeithS

Your question is kind of confusing; do you want to show only one row per user, or do you want to show a row per picture but suppress repeating values in the U.NAME field? I think you want the second; if not there are plenty of answers for the first.

你的问题有点令人困惑;您是希望每个用户只显示一行,还是希望每张图片显示一行但禁止 U.NAME 字段中的重复值?我想你想要第二个;如果没有,第一个有很多答案。

Whether to display repeating values is display logic, which SQL wasn't really designed for. You can use a cursor in a loop to process the results row-by-row, but you will lose a lot of performance. If you have a "smart" frontend language like a .NET language or Java, whatever construction you put this data into can be cheaply manipulated to suppress repeating values before finally displaying it in the UI.

是否显示重复值是显示逻辑,SQL 并不是真正为它设计的。可以在循环中使用游标逐行处理结果,但会损失很多性能。如果您有一种“智能”前端语言,如 .NET 语言或 Java,那么无论您将这些数据放入何种结构中,都可以廉价地操作以抑制重复值,然后最终将其显示在 UI 中。

If you're using Microsoft SQL Server, and the transformation HAS to be done at the data layer, you may consider using a CTE (Computed Table Expression) to hold the initial query, then select values from each row of the CTE based on whether the columns in the previous row hold the same data. It'll be more performant than the cursor, but it'll be kinda messy either way. Observe:

如果您使用的是 Microsoft SQL Server,并且转换必须在数据层完成,您可以考虑使用 CTE(计算表表达式)来保存初始查询,然后根据是否从 CTE 的每一行中选择值前一行中的列包含相同的数据。它会比光标更高效,但无论如何它都会有点混乱。观察:

USING CTE (Row, Name, PicID)
AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY U.NAME, P.PIC_ID),
       U.NAME, P.PIC_ID
    FROM USERS U
        INNER JOIN POSTINGS P1
            ON U.EMAIL_ID = P1.EMAIL_ID
        INNER JOIN PICTURES P
            ON P1.PIC_ID = P.PIC_ID
    WHERE P.CAPTION LIKE '%car%'
    ORDER BY U.NAME, P.PIC_ID 
)
SELECT
    CASE WHEN current.Name == previous.Name THEN '' ELSE current.Name END,
    current.PicID
FROM CTE current
LEFT OUTER JOIN CTE previous
   ON current.Row = previous.Row + 1
ORDER BY current.Row

The above sample is TSQL-specific; it is not guaranteed to work in any other DBPL like PL/SQL, but I think most of the enterprise-level SQL engines have something similar.

上面的示例是特定于 TSQL 的;它不能保证在任何其他 DBPL 中工作,如 PL/SQL,但我认为大多数企业级 SQL 引擎都有类似的东西。

回答by Xhalent

If I understand you correctly, you want to list to exclude duplicates on one column only, inner join to a sub-select

如果我理解正确,您只想列出以排除一列上的重复项,内部连接到子选择

select u.* [whatever joined values]
from users u
inner join
(select name from users group by name having count(*)=1) uniquenames
on uniquenames.name = u.name

回答by Brandon Horsley

You need to tell the query what value to pick for the other columns, MINor MAXseem like suitable choices.

您需要告诉查询为其他列选择什么值,MIN或者MAX看起来是合适的选择。

 SELECT
   U.NAME, MIN(P.PIC_ID)
 FROM
   USERS U,
   PICTURES P,
   POSTINGS P1
 WHERE
   U.EMAIL_ID = P1.EMAIL_ID AND
   P1.PIC_ID = P.PIC_ID AND
   P.CAPTION LIKE '%car%'
 GROUP BY
   U.NAME;

回答by Chris B. Behrens

If I understand you correctly, you want a list of all pictures with the same name (and their different ids) such that their name occurs more than once in the table. I think this will do the trick:

如果我理解正确,您需要一个包含所有同名(以及它们不同的 ID)图片的列表,以便它们的名称在表中出现多次。我认为这可以解决问题:

SELECT U.NAME, P.PIC_ID
FROM USERS U, PICTURES P, POSTINGS P1
WHERE U.EMAIL_ID = P1.EMAIL_ID AND P1.PIC_ID = P.PIC_ID AND U.Name IN (
SELECT U.Name 
FROM USERS U, PICTURES P, POSTINGS P1
WHERE U.EMAIL_ID = P1.EMAIL_ID AND P1.PIC_ID = P.PIC_ID AND P.CAPTION LIKE '%car%';
GROUP BY U.Name HAVING COUNT(U.Name) > 1)

I haven't executed it, so there may be a syntax error or two there.

我还没有执行它,所以那里可能有一两个语法错误。