MySQL 如何在mysql中找到顺序编号的差距?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4340793/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 17:52:49  来源:igfitidea点击:

How to find gaps in sequential numbering in mysql?

mysqlsqlgaps-and-islands

提问by EmmyS

We have a database with a table whose values were imported from another system. There is an auto-increment column, and there are no duplicate values, but there are missing values. For example, running this query:

我们有一个带有表的数据库,其值是从另一个系统导入的。有自增列,没有重复值,但有缺失值。例如,运行此查询:

select count(id) from arrc_vouchers where id between 1 and 100

should return 100, but it returns 87 instead. Is there any query I can run that will return the values of the missing numbers? For example, the records may exist for id 1-70 and 83-100, but there are no records with id's of 71-82. I want to return 71, 72, 73, etc.

应该返回 100,但它返回 87。我可以运行任何查询来返回缺失数字的值吗?例如,可能存在 id 1-70 和 83-100 的记录,但没有 id 为 71-82 的记录。我想返回 71、72、73 等。

Is this possible?

这可能吗?

回答by matt

Update

更新

ConfexianMJS provided much betteranswerin terms of performance.

ConfexianMJS在性能方面提供了更好的答案

The (not as fast as possible) answer

(不是尽可能快)答案

Here's version that works on table of any size (not just on 100 rows):

这是适用于任何大小的表格(不仅仅是 100 行)的版本:

SELECT (t1.id + 1) as gap_starts_at, 
       (SELECT MIN(t3.id) -1 FROM arrc_vouchers t3 WHERE t3.id > t1.id) as gap_ends_at
FROM arrc_vouchers t1
WHERE NOT EXISTS (SELECT t2.id FROM arrc_vouchers t2 WHERE t2.id = t1.id + 1)
HAVING gap_ends_at IS NOT NULL
  • gap_starts_at- first id in current gap
  • gap_ends_at- last id in current gap
  • gap_starts_at- 当前差距中的第一个 id
  • gap_ends_at- 当前间隙中的最后一个 ID

回答by ConfexianMJS

This just worked for me to find the gaps in a table with more than 80k rows:

这只是为我找到超过 80k 行的表中的空白:

SELECT
 CONCAT(z.expected, IF(z.got-1>z.expected, CONCAT(' thru ',z.got-1), '')) AS missing
FROM (
 SELECT
  @rownum:=@rownum+1 AS expected,
  IF(@rownum=YourCol, 0, @rownum:=YourCol) AS got
 FROM
  (SELECT @rownum:=0) AS a
  JOIN YourTable
  ORDER BY YourCol
 ) AS z
WHERE z.got!=0;

Result:

结果:

+------------------+
| missing          |
+------------------+
| 1 thru 99        |
| 666 thru 667     |
| 50000            |
| 66419 thru 66456 |
+------------------+
4 rows in set (0.06 sec)

Note that the order of columns expectedand gotis critical.

注意列的顺序expectedgot很关键。

If you know that YourColdoesn't start at 1 and that doesn't matter, you can replace

如果您知道这YourCol不是从 1 开始并且无关紧要,则可以替换

(SELECT @rownum:=0) AS a

with

(SELECT @rownum:=(SELECT MIN(YourCol)-1 FROM YourTable)) AS a

New result:

新结果:

+------------------+
| missing          |
+------------------+
| 666 thru 667     |
| 50000            |
| 66419 thru 66456 |
+------------------+
3 rows in set (0.06 sec)

If you need to perform some kind of shell script task on the missing IDs, you can also use this variant in order to directly produce an expression you can iterate over in bash.

如果您需要对缺失的 ID 执行某种 shell 脚本任务,您也可以使用此变体来直接生成可以在 bash 中迭代的表达式。

SELECT GROUP_CONCAT(IF(z.got-1>z.expected, CONCAT('$(',z.expected,' ',z.got-1,')'), z.expected) SEPARATOR " ") AS missing
FROM (  SELECT   @rownum:=@rownum+1 AS expected,   IF(@rownum=height, 0, @rownum:=height) AS got  FROM   (SELECT @rownum:=0) AS a   JOIN block   ORDER BY height  ) AS z WHERE z.got!=0;

This produces an output like so

这会产生像这样的输出

$(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456)

You can then copy and paste it into a for loop in a bash terminal to execute a command for every ID

然后,您可以将其复制并粘贴到 bash 终端中的 for 循环中,以便为每个 ID 执行命令

for ID in $(seq 1 99) $(seq 666 667) 50000 $(seq 66419 66456); do
  echo $ID
  # fill the gaps
done

It's the same thing as above, only that it's both readable and executable. By changing the "CONCAT" command above, syntax can be generated for other programming languages. Or maybe even SQL.

它与上面的相同,只是它既可读又可执行。通过更改上面的“CONCAT”命令,可以为其他编程语言生成语法。或者甚至 SQL。

回答by Ben

Quick and Dirty query that should do the trick:

应该可以解决问题的快速而肮脏的查询:

SELECT a AS id, b AS next_id, (b - a) -1 AS missing_inbetween
FROM 
 (
SELECT a1.id AS a , MIN(a2.id) AS b 
FROM arrc_vouchers  AS a1
LEFT JOIN arrc_vouchers AS a2 ON a2.id > a1.id
WHERE a1.id <= 100
GROUP BY a1.id
) AS tab

WHERE 
b > a + 1

This will give you a table showing the id that has ids missing above it, and next_id that exists, and how many are missing between...e.g.

这将为您提供一个表格,显示其上方缺少 id 的 id,存在的 next_id,以及之间缺少多少……例如

 
id  next_id  missing_inbetween
 1        4                  2
68       70                  1
75       87                 11

回答by Moshe L

If you are using an MariaDByou have a faster (800%) option using the sequence storage engine:

如果您使用的是使用序列存储引擎MariaDB的更快(800%)选项:

SELECT * FROM seq_1_to_50000 WHERE SEQ NOT IN (SELECT COL FROM TABLE);

回答by amelvin

Create a temporary table with 100 rows and a single column containing the values 1-100.

创建一个包含 100 行和包含值 1-100 的单列的临时表。

Outer Join this table to your arrc_vouchers table and select the single column values where the arrc_vouchers id is null.

外部 将此表连接到您的 arrc_vouchers 表并选择 arrc_vouchers id 为空的单列值。

Coding this blind, but should work.

编码这个盲目,但应该工作。

select tempid from temptable 
left join arrc_vouchers on temptable.tempid = arrc_vouchers.id 
where arrc_vouchers.id is null

回答by mgo1977

An alternative solution that requires a query + some code doing some processing would be:

需要查询 + 一些代码进行一些处理的替代解决方案是:

select l.id lValue, c.id cValue, r.id rValue 
  from 
  arrc_vouchers l 
  right join arrc_vouchers c on l.id=IF(c.id > 0, c.id-1, null)
  left  join arrc_vouchers r on r.id=c.id+1
where 1=1
  and c.id > 0 
  and (l.id is null or r.id is null)
order by c.id asc;

Note that the query does not contain any subselect that we know it's not handled performantly by MySQL's planner.

请注意,查询不包含任何我们知道它没有被 MySQL 的计划程序高效处理的子选择。

That will return one entry per centralValue (cValue) that does not have a smaller value (lValue) or a greater value (rValue), ie:

这将为每个没有较小值 (lValue) 或较大值 (rValue) 的 centralValue (cValue) 返回一个条目,即:

lValue |cValue|rValue
-------+------+-------
{null} | 2    | 3      
8      | 9    | {null} 
{null} | 22   | 23     
23     | 24   | {null} 
{null} | 29   | {null} 
{null} | 33   | {null} 



Without going into further details (we'll see them in next paragraphs) this output means that:

无需深入了解更多细节(我们将在下一段中看到它们),此输出意味着:

  • No values between 0 and 2
  • No values between 9 and 22
  • No values between 24 and 29
  • No values between 29 and 33
  • No values between 33 and MAX VALUE
  • 没有介于 0 和 2 之间的值
  • 9 到 22 之间没有值
  • 24 到 29 之间没有值
  • 29 到 33 之间没有值
  • 没有介于 33 和 MAX VALUE 之间的值

So the basic idea is to do a RIGHT and LEFT joins with the same table seeing if we have adjacents values per value (ie: if central value is '3' then we check for 3-1=2 at left and 3+1 at right), and when a ROW has a NULL value at RIGHT or LEFT then we know there is no adjacent value.

所以基本思想是对同一张表进行右和左连接,看看我们是否有每个值的相邻值(即:如果中心值为“3”,那么我们检查左边的 3-1=2 和 3+1右),并且当 ROW 在 RIGHT 或 LEFT 有一个 NULL 值时,我们就知道没有相邻的值。

The complete raw output of my table is:

我的表的完整原始输出是:

select * from arrc_vouchers order by id asc;

0  
2  
3  
4  
5  
6  
7  
8  
9  
22 
23 
24 
29 
33 

Some notes:

一些注意事项:

  1. The SQL IF statement in the join condition is needed if you define the 'id' field as UNSIGNED, therefore it will not allow you to decrease it under zero. This is not strictly necessary if you keep the c.value > 0 as it's stated in the next note, but I'm including it just as doc.
  2. I'm filtering the zero central value as we are not interested in any previous value and we can derive the post value from the next row.
  1. 如果您将 'id' 字段定义为 UNSIGNED,则连接条件中的 SQL IF 语句是必需的,因此它不允许您将其减少到零以下。如果您将 c.value > 0 保持为下一个注释中所述,则这不是绝对必要的,但我将其作为文档包含在内。
  2. 我正在过滤零中心值,因为我们对任何先前的值都不感兴趣,我们可以从下一行导出后值。

回答by PRAKHAR GUPTA

If there is a sequence having gap of maximum one between two numbers (like 1,3,5,6) then the query that can be used is:

如果两个数字之间存在最大间隔为 1 的序列(例如 1,3,5,6),则可以使用的查询是:

select s.id+1 from source1 s where s.id+1 not in(select id from source1) and s.id+1<(select max(id) from source1);
  • table_name - source1
  • column_name - id
  • 表名 - source1
  • 列名 - id

回答by lynx_74

I triedit in different manners and the best performance that I found was this simple query:

我以不同的方式尝试了它,我发现最好的性能是这个简单的查询:

select a.id+1 gapIni
    ,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
    from arrc_vouchers a
    left join arrc_vouchers b on b.id=a.id+1
    where b.id is null
    order by 1
;

... one left join to check if the next idexists, only if next if is not found, then the subquery find the next id that exists to find the end of gap. I did it because query with equal(=) is better performance than greater than(>) operator.

... 一个左连接检查下一个id 是否存在,只有如果没有找到下一个,则子查询找到下一个存在的 id 以找到间隙的结尾。我这样做是因为使用等于(=) 的查询比大于(>) 运算符的性能更好。

Using the sqlfiddleit not show so different performance of others query but in a real database this query above result 3 times more fast than others.

使用sqlfiddle它不会显示其他查询的不同性能,但在真实数据库中,上述查询的结果比其他查询快 3 倍。

The schema:

架构:

CREATE TABLE arrc_vouchers (id int primary key)
;
INSERT INTO `arrc_vouchers` (`id`) VALUES (1),(4),(5),(7),(8),(9),(10),(11),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29)
;

Follow bellow all query that I made to compare the performance:

遵循我为比较性能所做的所有查询:

select a.id+1 gapIni
    ,(select x.id-1 from arrc_vouchers x where x.id>a.id+1 limit 1) gapEnd
    from arrc_vouchers a
    left join arrc_vouchers b on b.id=a.id+1
    where b.id is null
    order by 1
;
select *, (gapEnd-gapIni) qt
    from (
        select id+1 gapIni
        ,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
        from arrc_vouchers a
        order by id
    ) a where gapEnd <> gapIni
;
select id+1 gapIni
    ,(select x.id from arrc_vouchers x where x.id>a.id limit 1) gapEnd
    #,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
    from arrc_vouchers a
    where id+1 <> (select x.id from arrc_vouchers x where x.id>a.id limit 1)
    order by id
;
select id+1 gapIni
    ,coalesce((select id from arrc_vouchers x where x.id=a.id+1),(select x.id from arrc_vouchers x where x.id>a.id limit 1)) gapEnd
    from arrc_vouchers a
    order by id
;
select id+1 gapIni
    ,coalesce((select id from arrc_vouchers x where x.id=a.id+1),concat('*** GAT *** ',(select x.id from arrc_vouchers x where x.id>a.id limit 1))) gapEnd
    from arrc_vouchers a
    order by id
;

Maybe it helps someone and useful.

也许它对某人有帮助并且很有用。

You can see and test my query using this sqlfiddle:

您可以使用此sqlfiddle查看和测试我的查询:

http://sqlfiddle.com/#!9/6bdca7/1

http://sqlfiddle.com/#!9/6bdca7/1

回答by RamRaider

based on the answer given above by Lucek this stored procedure allows you to specify the table and column names that you wish to test to find non-contiguous records - thus answering the original question and also demonstrating how one could use @var to represent tables &/or columns in a stored procedure.

根据 Lucek 上面给出的答案,此存储过程允许您指定要测试的表名和列名以查找非连续记录 - 从而回答原始问题并演示如何使用 @var 来表示表 & / 或存储过程中的列。

create definer=`root`@`localhost` procedure `spfindnoncontiguous`(in `param_tbl` varchar(64), in `param_col` varchar(64))
language sql
not deterministic
contains sql
sql security definer
comment ''
begin
declare strsql varchar(1000);
declare tbl varchar(64);
declare col varchar(64);

set @tbl=cast(param_tbl as char character set utf8);
set @col=cast(param_col as char character set utf8);

set @strsql=concat("select 
    ( t1.",@col," + 1 ) as starts_at, 
  ( select min(t3.",@col,") -1 from ",@tbl," t3 where t3.",@col," > t1.",@col," ) as ends_at
    from ",@tbl," t1
        where not exists ( select t2.",@col," from ",@tbl," t2 where t2.",@col," = t1.",@col," + 1 )
        having ends_at is not null");

prepare stmt from @strsql;
execute stmt;
deallocate prepare stmt;
end

回答by SScotti

Probably not relevant, but I was looking for something like this to list the gaps in a sequence of numbers and found this post, that has multiple different solutions depending upon exactly what you are looking for. I was looking for the first available gap in the sequence (i.e. next available number), and this seems to work fine.

可能不相关,但我一直在寻找这样的东西来列出数字序列中的差距并找到了这篇文章,它有多种不同的解决方案,具体取决于您正在寻找的内容。我正在寻找序列中的第一个可用间隙(即下一个可用数字),这似乎工作正常。

SELECT MIN(l.number_sequence + 1) as nextavabile from patients as l LEFT OUTER JOIN patients as r on l.number_sequence + 1 = r.number_sequence WHERE r.number_sequence is NULL. Several other scenarios and solutions discussed there, from 2005 !

SELECT MIN(l.number_sequence + 1) as nextavabile from病人为 l LEFT OUTER JOIN 病人为 r on l.number_sequence + 1 = r.number_sequence WHERE r.number_sequence 为 NULL。从 2005 年开始讨论的其他几个场景和解决方案!

How to Find Missing Values in a Sequence With SQL

如何使用 SQL 查找序列中的缺失值