SQL 从没有唯一标识的重复记录中获取第一条记录

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2178097/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 05:15:22  来源:igfitidea点击:

Get top first record from duplicate records having no unique identity

sqlsql-server-2005tsql

提问by Kashif

I need to fetch top first row out of each duplicate set of records from table given below. I need to use this query in view

我需要从下面给出的表中的每个重复记录集中获取第一行。我需要在视图中使用此查询

please no temp table as I have already done it by adding identity column and min function and group by. I need solution without temp table or table variable

请不要临时表,因为我已经通过添加标识列和最小函数并分组来完成它。我需要没有临时表或表变量的解决方案

This is just sample data. Original has 1000s of records in table and I need just result from top 1000 so can't use distinct

这只是示例数据。原始表中有 1000 条记录,我只需要前 1000 条记录的结果,因此不能使用不同的

I am using SQL Server 2005

我正在使用 SQL Server 2005

alt text

替代文字

Thanks.

谢谢。

采纳答案by Aaronaught

The answer depends on specifically what you mean by the "top 1000 distinct" records.

答案具体取决于您所说的“前 1000 条不同”记录的含义。

If you mean that you want to return at most 1000 distinct records, regardless of how many duplicates are in the table, then write this:

如果你的意思是你想返回最多 1000 条不同的记录,不管表中有多少重复,那么写:

SELECT DISTINCT TOP 1000 id, uname, tel
FROM Users
ORDER BY <sort_columns>

If you only want to searchthe first 1000 rows in the table, and potentially return much fewer than 1000 distinct rows, then you would write it with a subquery or CTE, like this:

如果您只想搜索表中的前 1000 行,并且可能返回的不同行远少于 1000 行,那么您可以使用子查询或 CTE 编写它,如下所示:

SELECT DISTINCT *
FROM
(
    SELECT TOP 1000 id, uname, tel
    FROM Users
    ORDER BY <sort_columns>
) u

The ORDER BYis of course optional if you don't care about which records you return.

ORDER BY当然是可选的,如果你不关心你返回的记录。

回答by HGMamaci

Find all products that has been ordered 1 or more times... (kind of duplicate records)

查找已订购 1 次或多次的所有产品...(一种重复记录)

SELECT DISTINCT * from [order_items] where productid in 
(SELECT productid 
  FROM [order_items]
  group by productid 
  having COUNT(*)>0)
order by productid 

To select the last inserted of those...

要选择最后插入的那些...

SELECT DISTINCT productid, MAX(id) OVER (PARTITION BY productid) AS LastRowId from [order_items] where productid in 
(SELECT productid 
  FROM [order_items]
  group by productid 
  having COUNT(*)>0)
order by productid 

回答by Mário Meyrelles

Sometimes you can use the CROSS APPLY operator like this:

有时您可以像这样使用 CROSS APPLY 运算符:

select distinct result.* from data d
cross apply (select top 1 * from data where data.Id = d.Id) result

In this query I need to pick only the first of many duplicates that naturally happen to occur in my data. It works on SQL Server 2005+ databases.

在这个查询中,我只需要选择自然发生在我的数据中的许多重复项中的第一个。它适用于 SQL Server 2005+ 数据库。

回答by malcom sankoh

You can try the following:

您可以尝试以下操作:

  1. Create a view that simply selects all the columns from the original table but add an extra numeric column that increase in value with each record\row. You may need to make this column a non integer column (e.g a decimal and increment it by 1.00 for each record to use it in the RANK() SQL statement).

  2. Also add another column (e.g. 'RecordRank') to contain calculated ranked values for all columns using the RANK() OVER SQL clause to create values for this column - see references below. The RANK statement allows you to partition the records and then order each partition records according to the values in the order by column (use the Column with increasing values from step 1 for your order by). You use the columns with identical data in the partition clause so all those similar duplicates are partitioned or grouped together, and then ordered by the values in the extra column (order by column from step1).

    http://msdn.microsoft.com/en-us/library/ms189461.aspx

  1. 创建一个视图,该视图仅选择原始表中的所有列,但添加一个额外的数字列,该列的值随每条记录\行而增加。您可能需要将此列设为非整数列(例如,小数并为每条记录增加 1.00 以在 RANK() SQL 语句中使用它)。

  2. 还添加另一列(例如“RecordRank”)以包含所有列的计算排名值,使用 RANK() OVER SQL 子句为此列创建值 - 请参阅下面的参考资料。RANK 语句允许您对记录进行分区,然后根据 order by 列中的值对每个分区记录进行排序(使用来自步骤 1 的具有递增值的 Column 作为您的 order by)。您在 partition 子句中使用具有相同数据的列,以便将所有这些相似的重复项分区或分组在一起,然后按额外列中的值排序(按步骤 1 中的列排序)。

    http://msdn.microsoft.com/en-us/library/ms189461.aspx

3, After successfully creating the above view, just write another view to select only records with 'RecordRank' = 1

3、上面的view创建成功后,再写一个view只选择'RecordRank'=1的记录

This should select only one of each record from the duplicates or partitions.

这应该只从重复项或分区中选择每条记录之一。

Hope this helps - malcom sankoh

希望这会有所帮助 - malcom sankoh

回答by HLGEM

YOur best bet is to fix the datbase design and add the identioty column to the table. Why do you havea table without one in the first place? Especially one with duplicate records! Clearly the database itself needs redesigning.

您最好的办法是修复数据库设计并将身份列添加到表中。为什么你有一张没有桌子的桌子?尤其是有重复记录的!显然,数据库本身需要重新设计。

And why do you have to have this in a view, why isn't your solution with the temp table a valid solution? Views are not usually a really good thing to do to a perfectly nice database.

为什么你必须在视图中看到这个,为什么你的临时表解决方案不是一个有效的解决方案?对于一个非常好的数据库来说,视图通常不是一件好事。

回答by Nikos Steiakakis

Doesn't SELECT DISTINCThelp? I suppose it would return the result you want.

没有SELECT DISTINCT帮助吗?我想它会返回你想要的结果。

回答by AdaTheDev

Using DISTINCT should do it:

使用 DISTINCT 应​​该这样做:

SELECT DISTINCT id, uname, tel
FROM YourTable

Though you could really do with having a primary key on that table, a way to uniquely identify each record. I'd be considering sticking an IDENTITY column on the table

尽管您确实可以在该表上使用主键,这是一种唯一标识每条记录的方法。我会考虑在桌子上贴一个 IDENTITY 列

回答by Eng. Aktham Nassar

Here are two solutions, I am using Oracle SQL server:

这里有两个解决方案,我使用的是 Oracle SQL 服务器:

1) using over clause:

1)使用over子句:

    with org_table as
 (select 1 id, 'Ali' uname
    from dual
  union
  select 1, 'June'
    from dual
  union
  select 2, 'Jame'
    from dual
  union
  select 2, 'July' from dual)
select id, uname
  from (select a.id,
               a.uname,
               ROW_NUMBER() OVER(PARTITION BY a.id ORDER BY a.id) AS freq

          from org_table a)
 where freq = 1

2) Using sub-query:

2)使用子查询:

    with org_table as
 (select 1 id, 'Ali' uname
    from dual
  union
  select 1, 'June'
    from dual
  union
  select 2, 'Jame'
    from dual
  union
  select 2, 'July' from dual)

select a.id,
       (select b.uname
          from org_table b
         where b.id = a.id
           and rownum = 1)
  from (select distinct id from org_table) a

回答by DevHash

SELECT TOP 1000 MAX(tel) FROM TableName WHERE Id IN 
(
SELECT Id FROM TableName
GROUP BY Id
HAVING COUNT(*) > 1
) 
GROUP BY Id