MySQL SQL:选择列值从上一行更改的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6560000/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 20:22:30  来源:igfitidea点击:

SQL: selecting rows where column value changed from previous row

mysqlsql

提问by Jimmy

Let's say I have this (MySQL) database, sorted by increasing timestamp:

假设我有这个(MySQL)数据库,按增加时间戳排序:

Timestamp   System StatusA StatusB 
2011-01-01     A      Ok     Ok      
2011-01-02     B      Ok     Ok     
2011-01-03     A     Fail   Fail     
2011-01-04     B      Ok    Fail     
2011-01-05     A     Fail    Ok      
2011-01-06     A      Ok     Ok      
2011-01-07     B     Fail   Fail    

How do I select the rows where StatusA changed from the previous row for that system? StatusB doesn't matter (I show it in this question only to illustrate that there may be many consecutive rows for each system where StatusA doesn't change). In the example above, the query should return the rows 2011-01-03 (StatusA changed between 2011-01-01 and 2011-01-03 for SystemA), 2011-01-06, 2011-01-07.

如何为该系统选择从上一行更改 StatusA 的行?StatusB 无关紧要(我在这个问题中显示它只是为了说明每个系统可能有许多连续的行,其中 StatusA 没有改变)。在上面的示例中,查询应返回行 2011-01-03(对于 SystemA,StatusA 在 2011-01-01 和 2011-01-03 之间更改)、2011-01-06、2011-01-07。

The query should execute quickly with the table having tens of thousands of records.

对于包含数万条记录的表,查询应该快速执行。

Thanks

谢谢

回答by ypercube??

SELECT a.*
FROM tableX AS a
WHERE a.StatusA <>
      ( SELECT b.StatusA
        FROM tableX AS b
        WHERE a.System = b.System
          AND a.Timestamp > b.Timestamp
        ORDER BY b.Timestamp DESC
        LIMIT 1
      ) 

But you can try this as well (with an index on (System,Timestamp):

但是你也可以试试这个(在 上有一个索引(System,Timestamp)

SELECT System, Timestamp, StatusA, StatusB
FROM
  ( SELECT (@statusPre <> statusA AND @systemPre=System) AS statusChanged
         , System, Timestamp, StatusA, StatusB
         , @statusPre := StatusA
         , @systemPre := System
    FROM tableX
       , (SELECT @statusPre:=NULL, @systemPre:=NULL) AS d
    ORDER BY System
           , Timestamp
  ) AS good
WHERE statusChanged ;

回答by Jiri

select a.Timestamp, a.System, a.StatusA, a.StatusB
from tableX as a
cross join tableX as b
where a.System = b.System
and a.Timestamp > b.Timestamp
and not exists (select * 
    from tableX as c
    where a.System = c.System
    and a.Timestamp > c.Timestamp
    and c.Timestamp > b.Timestamp
)
and a.StatusA <> b.StatusA;

Update addressing a comment:Why not use an inner join instead of a cross join?

更新解决评论:为什么不使用内部联接而不是交叉联接?

The question asks for a MySQL solution. According to the documentation:

该问题要求提供 MySQL 解决方案。根据文档

In MySQL, CROSS JOIN is a syntactic equivalent to INNER JOIN (they can replace each other). In standard SQL, they are not equivalent. INNER JOIN is used with an ON clause, CROSS JOIN is used otherwise.

在 MySQL 中,CROSS JOIN 是相当于 INNER JOIN 的语法(它们可以相互替换)。在标准 SQL 中,它们不是等价的。INNER JOIN 与 ON 子句一起使用,否则使用 CROSS JOIN。

This means that either of these joins would work.

这意味着这些连接中的任何一个都可以工作。

The conditional_expr used with ON is any conditional expression of the form that can be used in a WHERE clause. Generally, you should use the ON clause for conditions that specify how to join tables, and the WHERE clause to restrict which rows you want in the result set.

与 ON 一起使用的 conditional_expr 是可以在 WHERE 子句中使用的形式的任何条件表达式。通常,您应该将 ON 子句用于指定如何连接表的条件,并使用 WHERE 子句来限制您想要在结果集中的行。

The condition a.System = b.Systemprobably falls under the 'how to join tables' category so using an INNER JOIN would be nicer in this case.

该条件a.System = b.System可能属于“如何连接表”类别,因此在这种情况下使用 INNER JOIN 会更好。

Since both produce the same results, the difference might be in performance. To say which will be faster I would need to know how are the joins implemented internally - whether they use indexes or hashing to do the joining.

由于两者产生相同的结果,因此不同之处可能在于性能。要说哪个更快,我需要知道连接是如何在内部实现的——它们是使用索引还是散列来进行连接。

回答by Egor

Use rownum

使用行数

I've got 0.05 seconds on 20000 rows

我在 20000 行上有 0.05 秒

select a1.*
  from (select rownum R_NUM, TIMESTAMP, System, StatusA from TableX) a1 
  join (select rownum R_NUM, TIMESTAMP, SYSTEM, STATUSA from TABLEX) a2 
    on a1.R_NUM = a2.R_NUM+1 
 where a1.system = a2.system 
   and a1.StatusA != a2.StatusA

回答by dkretz

Here's a slightly shorter version with similar logic. I've tested this so often I'm sure it's efficient; primarily because it eliminates the correlated subquery (WHERE NOT EXISIS).

这是一个具有类似逻辑的略短版本。我经常测试这个,我确信它是有效的;主要是因为它消除了相关子查询(WHERE NOT EXISIS)。

"c" is in there to make sure that b is directly below a - it says c (between them) can't be found (via the NULL test).

“c”在那里以确保 b 直接低于 a - 它表示无法找到 c (它们之间)(通过 NULL 测试)。

SELECT a.Timestamp, a.System, a.StatusA, a.StatusB
FROM tableX AS a
JOIN tableX AS b
    ON a.System = b.System
    AND a.Timestamp > b.Timestamp
LEFT JOIN tableX AS c
    ON a.System = b.System
    AND a.Timestamp > c.Timestamp
    AND b.Timestamp < c.Timestamp
WHERE c.System IS NULL
    AND a.StatusA <> b.StatusA;

回答by jimmyp

SELECT   a.*
FROM    (select row_number() over (partition by System order by Timestamp asc) as aRow, Timestamp, System, StatusA, StatusB from tableX) as a
left join (select row_number() over (partition by System order by Timestamp asc) as bRow, Timestamp, System, StatusA, StatusB from tableX) as b on a.aRow = b.bRow + 1 and a.System = b.System 
where (a.StatusA != b.StatusA or b.StatusA is null)

It will return first row and rows where value is different.

它将返回第一行和值不同的行。

回答by Allen

Egor's answer worked for me in MSSQL with one small change. Had to replace the ROWNUM statements with:

叶戈尔的答案在 MSSQL 中对我有用,但有一个小改动。必须将 ROWNUM 语句替换为:

select row_number () over (order by TIMESTAMP) as R_NUM, ...