仅选择 SQL 中列中重复值的第一行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8683547/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Only select first row of repeating value in a column in SQL
提问by RYN
I have table that has a column that may have same values in a burst. Like this:
我有一个表,其中有一列可能在突发中具有相同的值。像这样:
+----+---------+
| id | Col1 |
+----+---------+
| 1 | 6050000 |
+----+---------+
| 2 | 6050000 |
+----+---------+
| 3 | 6050000 |
+----+---------+
| 4 | 6060000 |
+----+---------+
| 5 | 6060000 |
+----+---------+
| 6 | 6060000 |
+----+---------+
| 7 | 6060000 |
+----+---------+
| 8 | 6060000 |
+----+---------+
| 9 | 6050000 |
+----+---------+
| 10 | 6000000 |
+----+---------+
| 11 | 6000000 |
+----+---------+
Now I want to prune rows where the value of Col1
is repeated and only select the first occurrence.
For the above table the result should be:
现在我想修剪Col1
重复值的行,只选择第一次出现的行。
对于上表,结果应该是:
+----+---------+
| id | Col1 |
+----+---------+
| 1 | 6050000 |
+----+---------+
| 4 | 6060000 |
+----+---------+
| 9 | 6050000 |
+----+---------+
| 10 | 6000000 |
+----+---------+
How can I do this in SQL?
Note that only burst rows should be removed and values can be repeated in non-burst rows!id=1
& id=9
are repeated in sample result.
如何在 SQL 中执行此操作?
请注意,只应删除突发行,并且可以在非突发行中重复值!id=1
&id=9
在样本结果中重复。
EDIT:
I achieved it using this:
编辑:
我用这个实现了它:
select id,col1 from data as d1
where not exists (
Select id from data as d2
where d2.id=d1.id-1 and d1.col1=d2.col1 order by id limit 1)
But this only works when ids are sequential. With gaps between ids (deleted ones) the query breaks. How can I fix this?
但这仅在 id 是连续的时才有效。由于 id(已删除的)之间的差距,查询会中断。我怎样才能解决这个问题?
采纳答案by Erwin Brandstetter
You can use a EXISTS
semi-join to identify candidates:
您可以使用EXISTS
半连接来确定候选人:
Select wanted rows:
选择想要的行:
SELECT * FROM tbl
WHERE NOT EXISTS (
SELECT *
FROM tbl t
WHERE t.col1 = tbl.col1
AND t.id = tbl.id - 1
)
ORDER BY id
Get rid of unwanted rows:
摆脱不需要的行:
DELETE FROM tbl
-- SELECT * FROM tbl
WHERE EXISTS (
SELECT *
FROM tbl t
WHERE t.col1 = tbl.col1
AND t.id = tbl.id - 1
)
This effectively deletes every row, where the preceding row has the same value in col1
, thereby arriving at your set goal: only the first row of every burst survives.
这有效地删除了每一行,其中前一行在 中具有相同的值col1
,从而达到您设定的目标:只有每个突发的第一行幸存下来。
I left the commented SELECT
statement because you should alwayscheck what is going to be deleted before you do the deed.
我留下了评论SELECT
声明,因为您应该始终在执行操作之前检查将要删除的内容。
Solution for non-sequential IDs:
非连续 ID 的解决方案:
If your RDBMS supports the CTEand window functions(like PostgreSQL, Oracle, SQL Server, ... but notSQLite, MS Access or MySQL), there is an elegant way:
如果您的 RDBMS 支持CTE和窗口函数(如 PostgreSQL、Oracle、SQL Server...但不支持SQLite、MS Access 或 MySQL),则有一种优雅的方法:
WITH x AS (
SELECT *, row_number() OVER (ORDER BY id) AS rn
FROM tbl
)
SELECT id, col1
FROM x
WHERE NOT EXISTS (
SELECT *
FROM x x1
WHERE x1.col1 = x.col1
AND x1.rn = x.rn - 1
)
ORDER BY id;
There is also the not-so-elegant way that does the job without those niceties.
Should work for you:
还有一种不太优雅的方式可以在没有这些细节的情况下完成工作。
应该为你工作:
SELECT id, col1
FROM tbl
WHERE (
SELECT t.col1 = tbl.col1
FROM tbl AS t
WHERE t.id < tbl.id
ORDER BY id DESC
LIMIT 1) IS NOT TRUE
ORDER BY id
Tool for test-casing non-sequential IDs
用于测试用例非序列 ID 的工具
(Tested in PostgreSQL)
(在 PostgreSQL 中测试)
CREATE TEMP TABLE tbl (id int, col1 int);
INSERT INTO tbl VALUES
(1,6050000),(2,6050000),(6,6050000)
,(14,6060000),(15,6060000),(16,6060000)
,(17,6060000),(18,6060000),(19,6050000)
,(20,6000000),(111,6000000);
回答by The Pickle
select min(id), Col1 from tableName group by Col1
回答by Rob Paller
If your RDBMS supports Window Aggregate functions and/or LEAD() and LAG() functions you can leverage them to accomplish what you are trying to report. The following SQL will help get you started down the right path:
如果您的 RDBMS 支持 Window Aggregate 函数和/或 LEAD() 和 LAG() 函数,您可以利用它们来完成您要报告的内容。以下 SQL 将帮助您走上正确的道路:
SELECT id
, Col AS CurCol
, MAX(Col)
OVER(ORDER BY id ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS PrevCol
, MIN(COL)
OVER(ORDER BY id ROWS BETWEEN 1 FOLLOWING AND 1 FOLLOWING) AS NextCol
FROM MyTable
From there you can put that SQL in a derived table with some CASE logic that if the NextCol
or PrevCol
is the same as CurCol
then set CurCol = NULL
. Then you can collapse eliminate all the id records CurCol IS NULL
.
从那里,您可以将该 SQL 放入带有一些 CASE 逻辑的派生表中,如果NextCol
orPrevCol
与CurCol
then set相同CurCol = NULL
。然后你可以折叠消除所有的 id 记录CurCol IS NULL
。
If you don't have the ability to use window aggregates or LEAD/LAG functions your task is a little more complex.
如果您无法使用窗口聚合或 LEAD/LAG 函数,您的任务会稍微复杂一些。
Hope this helps.
希望这可以帮助。
回答by Andriy M
Since id
is always sequential, with no gaps or repetitions, as per your comment, you could use the following method:
由于id
始终是连续的,没有间隙或重复,根据您的评论,您可以使用以下方法:
SELECT t1.*
FROM atable t1
LEFT JOIN atable t2 ON t1.id = t2.id + 1 AND t1.Col1 = t2.Col1
WHERE t2.id IS NULL
The table is (outer-)joined to itself on the condition that the left side's id
is one greater than the right side's and their Col1
values are identical. In other words, the condition is ‘the previous row contains the same Col1
value as the current row'. If there's no match on the right, then the current record should be selected.
在左边的id
比右边大一并且它们的Col1
值相同的条件下,该表是(外部)连接到自身的。换句话说,条件是'前一行包含与Col1
当前行相同的值'。如果右侧没有匹配项,则应选择当前记录。
UPDATE
更新
To account for non-sequential id
s (which, however, are assumed to be unique and defining the order of changes of Col1
), you could also try the following query:
考虑到非顺序的id
s(但是,假定它是唯一的并定义了 的更改顺序Col1
),您还可以尝试以下查询:
SELECT t1.*
FROM atable t1
LEFT JOIN atable t2 ON t1.id > t2.id
LEFT JOIN atable t3 ON t1.id > t3.id AND t3.id > t2.id
WHERE t3.id IS NULL
AND (t2.id IS NULL OR t2.Col1 <> t1.Col1)
The third self-join is there to ensure that the second one yields the row directly preceding that of t1
. That is, if there's no match for t3
, then either t2
contains the preceding row or it's got no match either, the latter meaning that t1
's current row is the top one.
第三个自联接是为了确保第二个自联接产生直接在 的行之前的行t1
。也就是说,如果没有匹配t3
,则要么t2
包含前一行,要么也没有匹配项,后者意味着t1
的当前行是最前面的行。