SQL 如何以每条记录都与“上一个”记录连接的方式自连接表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15527423/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 14:23:24  来源:igfitidea点击:

How to self-join table in a way that every record is joined with the "previous" record?

sqlsql-serverperformancesql-server-2008

提问by Michal B.

I have a MS SQL table that contains stock data with the following columns: Id, Symbol, Date, Open, High, Low, Close.

我有一个 MS SQL 表,其中包含具有以下列的股票数据:Id, Symbol, Date, Open, High, Low, Close.

I would like to self-join the table, so I can get a day-to-day % change for Close.

我想自行加入该表,以便我可以获得Close.

I must create a query that will join the table with itself in a way that every record contains also the data from the previous session (be aware, that I cannot use yesterday's date).

我必须创建一个查询,该查询将以每条记录还包含来自前一个会话的数据的方式将表与自身连接起来(请注意,我不能使用昨天的日期)。

My idea is to do something like this:

我的想法是做这样的事情:

select * from quotes t1
inner join quotes t2
on t1.symbol = t2.symbol and
t2.date = (select max(date) from quotes where symbol = t1.symbol and date < t1.date)

However I do not know if that's the correct/fastest way. What should I take into account when thinking about performance? (E.g. will putting UNIQUE index on a (Symbol, Date) pair improve performance?)

但是我不知道这是否是正确/最快的方法。在考虑性能时我应该考虑什么?(例如,将 UNIQUE 索引放在 (Symbol, Date) 对上会提高性能吗?)

There will be around 100,000 new records every year in this table. I am using MS SQL Server 2008

此表中每年将有大约 100,000 条新记录。我正在使用 MS SQL Server 2008

回答by sgeddes

One option is to use a recursive cte (if I'm understanding your requirements correctly):

一种选择是使用递归 cte(如果我正确理解您的要求):

WITH RNCTE AS (
  SELECT *, ROW_NUMBER() OVER (PARTITION BY symbol ORDER BY date) rn
        FROM quotes
  ),
CTE AS (
  SELECT symbol, date, rn, cast(0 as decimal(10,2)) perc, closed
  FROM RNCTE
  WHERE rn = 1
  UNION ALL
  SELECT r.symbol, r.date, r.rn, cast(c.closed/r.closed as decimal(10,2)) perc, r.closed
  FROM CTE c 
    JOIN RNCTE r on c.symbol = r.symbol AND c.rn+1 = r.rn
  )
SELECT * FROM CTE
ORDER BY symbol, date

SQL Fiddle Demo

SQL 小提琴演示

If you need a running total for each symbol to use as the percentage change, then easy enough to add an additional column for that amount -- wasn't completely sure what your intentions were, so the above just divides the current closed amount by the previous closed amount.

如果您需要每个交易品种的运行总数用作百分比变化,那么很容易为该金额添加一个额外的列 - 不完全确定您的意图是什么,因此以上只是将当前关闭的金额除以之前关闭的金额。

回答by esteewhy

Something like this w'd work in SQLite:

像这样的东西可以在 SQLite 中工作:

SELECT ..
FROM quotes t1, quotes t2
WHERE t1.symbol = t2.symbol
    AND t1.date < t2.date
GROUP BY t2.ID
    HAVING t2.date = MIN(t2.date)

Given SQLite is a simplest of a kind, maybe in MSSQL this will also work with minimal changes.

鉴于 SQLite 是最简单的一种,也许在 MSSQL 中,这也适用于最少的更改。

回答by Anon

Index on (symbol, date)

索引 (symbol, date)

SELECT *
FROM quotes q_curr
CROSS APPLY (
  SELECT TOP(1) *
  FROM quotes
  WHERE symbol = q_curr.symbol
    AND date < q_curr.date
  ORDER BY date DESC
) q_prev

回答by Jeremy Hutchinson

You do something like this:

你做这样的事情:

with OrderedQuotes as
(
    select 
        row_number() over(order by Symbol, Date) RowNum, 
        ID, 
        Symbol, 
        Date, 
        Open, 
        High, 
        Low, 
        Close
      from Quotes
)
select
    a.Symbol,
    a.Date,
    a.Open,
    a.High,
    a.Low,
    a.Close,
    a.Date PrevDate,
    a.Open PrevOpen,
    a.High PrevHigh,
    a.Low PrevLow,
    a.Close PrevClose,

    b.Close-a.Close/a.Close PctChange

  from OrderedQuotes a
  join OrderedQuotes b on a.Symbol = b.Symbol and a.RowNum = b.RowNum + 1

If you change the last join to a left join you get a row for the first date for each symbol, not sure if you need that.

如果您将最后一个连接更改为左连接,您将获得每个符号的第一个日期的一行,不确定是否需要。

回答by Marlin Pierce

What you had is fine. I don't know if translating the sub-query into the join will help. However, you asked for it, so the way to do it might be to join the table to itself once more.

你所拥有的很好。我不知道将子查询转换为连接是否会有所帮助。但是,您要求它,因此执行此操作的方法可能是再次将表连接到自身。

select *
from quotes t1
inner join quotes t2
   on t1.symbol = t2.symbol and t1.date > t2.date
left outer join quotes t3
   on t2.symbol = t3.symbol and t2.date > t3.date
where t3.date is null

回答by Aleksandr Fedorenko

You can use option with CTEand ROW_NUMBERranking function

您可以将选项与CTEROW_NUMBER排名函数一起使用

 ;WITH cte AS
 (
  SELECT symbol, date, [Open], [High], [Low], [Close],
         ROW_NUMBER() OVER(PARTITION BY symbol ORDER BY date) AS Id
  FROM quotes
  )
  SELECT c1.Id, c1.symbol, c1.date, c1.[Open], c1.[High], c1.[Low], c1.[Close], 
         ISNULL(c2.[Close] / c1.[Close], 0) AS perc
  FROM cte c1 LEFT JOIN cte c2 ON c1.symbol = c2.symbol AND c1.Id = c2.Id + 1
  ORDER BY c1.symbol, c1.date

For improving performance(avoiding sorting and RID Lookup) use this index

为了提高性能(避免排序和 RID 查找),请使用此索引

CREATE INDEX ix_symbol$date_quotes ON quotes(symbol, date) INCLUDE([Open], [High], [Low], [Close])

Simple demo on SQLFiddle

SQLFiddle 上的简单演示

回答by Jason Whitish

You could do something like this:

你可以这样做:

DECLARE @Today DATETIME
SELECT @Today = DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP))

;WITH today AS
(
    SELECT  Id ,
            Symbol ,
            Date ,
            [OPEN] ,
            High ,
            LOW ,
            [CLOSE],
            DATEADD(DAY, -1, Date) AS yesterday 
    FROM quotes
    WHERE date = @today
)
SELECT *
FROM today
LEFT JOIN quotes yesterday ON today.Symbol = yesterday.Symbol
    AND today.yesterday = yesterday.Date

That way you limit your "today" results, if that's an option.

这样你就可以限制你的“今天”结果,如果这是一个选项。

EDIT: The CTEs listed as other questions may work well, but I tend to be hesitant to use ROW_NUMBER when dealing with 100K rows or more. If the previous day may not always be yesterday, I tend to prefer to pull out the check for the previous day in its own query then use it for reference:

编辑:作为其他问题列出的 CTE 可能效果很好,但在处理 100K 行或更多行时,我倾向于犹豫是否使用 ROW_NUMBER。如果前一天可能并不总是昨天,我倾向于在自己的查询中取出前一天的支票,然后将其用作参考:

DECLARE @Today DATETIME, @PreviousDay DATETIME
SELECT @Today = DATEADD(DAY, 0, DATEDIFF(DAY, 0, CURRENT_TIMESTAMP));
SELECT @PreviousDay = MAX(Date) FROM quotes  WHERE Date < @Today;
WITH today AS
(
    SELECT  Id ,
            Symbol ,
            Date ,
            [OPEN] ,
            High ,
            LOW ,
            [CLOSE]
    FROM quotes 
    WHERE date = @today
)
SELECT *
FROM today
LEFT JOIN quotes AS previousday
    ON today.Symbol = previousday.Symbol
    AND previousday.Date = @PreviousDay