针对日期范围的 SQL 连接?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2306462/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 05:27:11  来源:igfitidea点击:

SQL join against date ranges?

sqlsql-servertsqljoindate-range

提问by Herb Caudill

Consider two tables:

考虑两个表:

Transactions, with amounts in a foreign currency:

交易,金额为外币:

     Date  Amount
========= =======
 1/2/2009    1500
 2/4/2009    2300
3/15/2009     300
4/17/2009    2200
etc.

ExchangeRates, with the value of the primary currency (let's say dollars) in the foreign currency:

ExchangeRates,外币的主要货币(比如美元)的价值:

     Date    Rate
========= =======
 2/1/2009    40.1
 3/1/2009    41.0
 4/1/2009    38.5
 5/1/2009    42.7
etc.

Exchange rates can be entered for arbitrary dates - the user could enter them on a daily basis, weekly basis, monthly basis, or at irregular intervals.

可以输入任意日期的汇率 - 用户可以每天、每周、每月或不定期地输入它们。

In order to translate the foreign amounts to dollars, I need to respect these rules:

为了将外国金额转换为美元,我需要遵守以下规则:

A. If possible, use the most recent previous rate; so the transaction on 2/4/2009 uses the rate for 2/1/2009, and the transaction on 3/15/2009 uses the rate for 3/1/2009.

A. 如果可能,使用最近的先前汇率;所以 2/4/2009 的交易使用 2/1/2009 的汇率,而 3/15/2009 的交易使用 3/1/2009 的汇率。

B. If there isn't a rate defined for a previous date, use the earliest rate available. So the transaction on 1/2/2009 uses the rate for 2/1/2009, since there isn't an earlier rate defined.

B. 如果没有为前一个日期定义费率,请使用可用的最早费率。因此,2009 年 1 月 2 日的交易使用 2009 年 1 月的汇率,因为没有定义更早的汇率。

This works...

这工作...

Select 
    t.Date, 
    t.Amount,
    ConvertedAmount=(   
        Select Top 1 
            t.Amount/ex.Rate
        From ExchangeRates ex
        Where t.Date > ex.Date
        Order by ex.Date desc
    )
From Transactions t

... but (1) it seems like a join would be more efficient & elegant, and (2) it doesn't deal with Rule B above.

...但是 (1) 似乎连接会更高效和优雅,并且 (2) 它不处理上面的规则 B。

Is there an alternative to using the subquery to find the appropriate rate? And is there an elegant way to handle Rule B, without tying myself in knots?

除了使用子查询来查找合适的费率之外,还有其他选择吗?有没有一种优雅的方法来处理规则 B,而不用束缚自己?

回答by Lucero

You could first do a self-join on the exchange rates which are ordered by date so that you have the start and the end date of each exchange rate, without any overlap or gap in the dates (maybe add that as view to your database - in my case I'm just using a common table expression).

您可以首先对按日期排序的汇率进行自联接,以便您拥有每个汇率的开始和结束日期,而日期之间没有任何重叠或间隔(可以将其添加为您的数据库的视图 -在我的情况下,我只是使用一个公用表表达式)。

Now joining those "prepared" rates with the transactions is simple and efficient.

现在将这些“准备好的”费率与交易结合起来既简单又高效。

Something like:

就像是:

WITH IndexedExchangeRates AS (           
            SELECT  Row_Number() OVER (ORDER BY Date) ix,
                    Date,
                    Rate 
            FROM    ExchangeRates 
        ),
        RangedExchangeRates AS (             
            SELECT  CASE WHEN IER.ix=1 THEN CAST('1753-01-01' AS datetime) 
                    ELSE IER.Date 
                    END DateFrom,
                    COALESCE(IER2.Date, GETDATE()) DateTo,
                    IER.Rate 
            FROM    IndexedExchangeRates IER 
            LEFT JOIN IndexedExchangeRates IER2 
            ON IER.ix = IER2.ix-1 
        )
SELECT  T.Date,
        T.Amount,
        RER.Rate,
        T.Amount/RER.Rate ConvertedAmount 
FROM    Transactions T 
LEFT JOIN RangedExchangeRates RER 
ON (T.Date > RER.DateFrom) AND (T.Date <= RER.DateTo)

Notes:

笔记:

  • You could replace GETDATE()with a date in the far future, I'm assuming here that no rates for the future are known.

  • Rule (B) is implemented by setting the date of the first known exchange rate to the minimal date supported by the SQL Server datetime, which should (by definition if it is the type you're using for the Datecolumn) be the smallest value possible.

  • 你可以GETDATE()用一个遥远的未来的日期来替换,我在这里假设未来的利率是未知的。

  • 规则 (B) 是通过将第一个已知汇率的日期设置为 SQL Server 支持的最小日期来实现的datetime,它应该(根据定义,如果它是您用于Date列的类型)是可能的最小值。

回答by Jonathan Leffler

Suppose you had an extended exchange rate table that contained:

假设您有一个扩展汇率表,其中包含:

 Start Date   End Date    Rate
 ========== ========== =======
 0001-01-01 2009-01-31    40.1
 2009-02-01 2009-02-28    40.1
 2009-03-01 2009-03-31    41.0
 2009-04-01 2009-04-30    38.5
 2009-05-01 9999-12-31    42.7

We can discuss the details of whether the first two rows should be combined, but the general idea is that it is trivial to find the exchange rate for a given date. This structure works with the SQL 'BETWEEN' operator which includes the ends of the ranges. Often, a better format for ranges is 'open-closed'; the first date listed is included and the second is excluded. Note that there is a constraint on the data rows - there are (a) no gaps in the coverage of the range of dates and (b) no overlaps in the coverage. Enforcing those constraints is not completely trivial (polite understatement - meiosis).

我们可以讨论前两行是否应该合并的细节,但总体思路是找到给定日期的汇率是微不足道的。此结构与 SQL 'BETWEEN' 运算符一起使用,该运算符包括范围的结尾。通常,更好的范围格式是“开闭”;列出的第一个日期被包括在内,第二个被排除在外。请注意,数据行有一个限制 - (a) 日期范围的覆盖范围没有间隙,(b) 覆盖范围没有重叠。强制执行这些约束并非完全微不足道(礼貌的轻描淡写 - 减数分裂)。

Now the basic query is trivial, and Case B is no longer a special case:

现在基本查询是微不足道的,情况 B 不再是特例:

SELECT T.Date, T.Amount, X.Rate
  FROM Transactions AS T JOIN ExtendedExchangeRates AS X
       ON T.Date BETWEEN X.StartDate AND X.EndDate;

The tricky part is creating the ExtendedExchangeRate table from the given ExchangeRate table on the fly. If it is an option, then revising the structure of the basic ExchangeRate table to match the ExtendedExchangeRate table would be a good idea; you resolve the messy stuff when the data is entered (once a month) instead of every time an exchange rate needs to be determined (many times a day).

棘手的部分是根据给定的 ExchangeRate 表动态创建 ExtendedExchangeRate 表。如果这是一个选项,那么修改基本 ExchangeRate 表的结构以匹配 ExtendedExchangeRate 表将是一个好主意;您可以在输入数据时(每月一次)解决凌乱的问题,而不是每次需要确定汇率时(一天多次)。

How to create the extended exchange rate table? If your system supports adding or subtracting 1 from a date value to obtain the next or previous day (and has a single row table called 'Dual'), then a variation on this will work (without using any OLAP functions):

如何创建扩展汇率表?如果您的系统支持从日期值中加或减 1 以获得第二天或前一天(并且有一个名为“Dual”的单行表),那么对此的变体将起作用(不使用任何 OLAP 函数):

CREATE TABLE ExchangeRate
(
    Date    DATE NOT NULL,
    Rate    DECIMAL(10,5) NOT NULL
);
INSERT INTO ExchangeRate VALUES('2009-02-01', 40.1);
INSERT INTO ExchangeRate VALUES('2009-03-01', 41.0);
INSERT INTO ExchangeRate VALUES('2009-04-01', 38.5);
INSERT INTO ExchangeRate VALUES('2009-05-01', 42.7);

First row:

第一排:

SELECT '0001-01-01' AS StartDate,
       (SELECT MIN(Date) - 1 FROM ExchangeRate) AS EndDate,
       (SELECT Rate FROM ExchangeRate
         WHERE Date = (SELECT MIN(Date) FROM ExchangeRate)) AS Rate
FROM Dual;

Result:

结果:

0001-01-01  2009-01-31      40.10000

Last row:

最后一行:

SELECT (SELECT MAX(Date) FROM ExchangeRate) AS StartDate,
       '9999-12-31' AS EndDate,
       (SELECT Rate FROM ExchangeRate
         WHERE Date = (SELECT MAX(Date) FROM ExchangeRate)) AS Rate
FROM Dual;

Result:

结果:

2009-05-01  9999-12-31      42.70000

Middle rows:

中间行:

SELECT X1.Date     AS StartDate,
       X2.Date - 1 AS EndDate,
       X1.Rate     AS Rate
  FROM ExchangeRate AS X1 JOIN ExchangeRate AS X2
       ON X1.Date < X2.Date
 WHERE NOT EXISTS
       (SELECT *
          FROM ExchangeRate AS X3
         WHERE X3.Date > X1.Date AND X3.Date < X2.Date
        );

Result:

结果:

2009-02-01  2009-02-28      40.10000
2009-03-01  2009-03-31      41.00000
2009-04-01  2009-04-30      38.50000

Note that the NOT EXISTS sub-query is rather crucial. Without it, the 'middle rows' result is:

请注意, NOT EXISTS 子查询相当重要。没有它,“中间行”结果是:

2009-02-01  2009-02-28      40.10000
2009-02-01  2009-03-31      40.10000    # Unwanted
2009-02-01  2009-04-30      40.10000    # Unwanted
2009-03-01  2009-03-31      41.00000
2009-03-01  2009-04-30      41.00000    # Unwanted
2009-04-01  2009-04-30      38.50000

The number of unwanted rows increases dramatically as the table increases in size (for N > 2 rows, there are (N-2) * (N - 3) / 2 unwanted rows, I believe).

随着表格大小的增加,不需要的行数急剧增加(对于 N > 2 行,我相信有 (N-2) * (N - 3) / 2 个不需要的行)。

The result for ExtendedExchangeRate is the (disjoint) UNION of the three queries:

ExtendedExchangeRate 的结果是三个查询的(不相交的)UNION:

SELECT DATE '0001-01-01' AS StartDate,
       (SELECT MIN(Date) - 1 FROM ExchangeRate) AS EndDate,
       (SELECT Rate FROM ExchangeRate
         WHERE Date = (SELECT MIN(Date) FROM ExchangeRate)) AS Rate
FROM Dual
UNION
SELECT X1.Date     AS StartDate,
       X2.Date - 1 AS EndDate,
       X1.Rate     AS Rate
  FROM ExchangeRate AS X1 JOIN ExchangeRate AS X2
       ON X1.Date < X2.Date
 WHERE NOT EXISTS
       (SELECT *
          FROM ExchangeRate AS X3
         WHERE X3.Date > X1.Date AND X3.Date < X2.Date
        )
UNION
SELECT (SELECT MAX(Date) FROM ExchangeRate) AS StartDate,
       DATE '9999-12-31' AS EndDate,
       (SELECT Rate FROM ExchangeRate
         WHERE Date = (SELECT MAX(Date) FROM ExchangeRate)) AS Rate
FROM Dual;

On the test DBMS (IBM Informix Dynamic Server 11.50.FC6 on MacOS X 10.6.2), I was able to convert the query into a view but I had to stop cheating with the data types - by coercing the strings into dates:

在测试 DBMS(MacOS X 10.6.2 上的 IBM Informix Dynamic Server 11.50.FC6)上,我能够将查询转换为视图,但我不得不停止欺骗数据类型 - 通过将字符串强制转换为日期:

CREATE VIEW ExtendedExchangeRate(StartDate, EndDate, Rate) AS
    SELECT DATE('0001-01-01')  AS StartDate,
           (SELECT MIN(Date) - 1 FROM ExchangeRate) AS EndDate,
           (SELECT Rate FROM ExchangeRate WHERE Date = (SELECT MIN(Date) FROM ExchangeRate)) AS Rate
    FROM Dual
    UNION
    SELECT X1.Date     AS StartDate,
           X2.Date - 1 AS EndDate,
           X1.Rate     AS Rate
      FROM ExchangeRate AS X1 JOIN ExchangeRate AS X2
           ON X1.Date < X2.Date
     WHERE NOT EXISTS
           (SELECT *
              FROM ExchangeRate AS X3
             WHERE X3.Date > X1.Date AND X3.Date < X2.Date
            )
    UNION 
    SELECT (SELECT MAX(Date) FROM ExchangeRate) AS StartDate,
           DATE('9999-12-31') AS EndDate,
           (SELECT Rate FROM ExchangeRate WHERE Date = (SELECT MAX(Date) FROM ExchangeRate)) AS Rate
    FROM Dual;

回答by Ray

I can't test this, but I think it would work. It uses coalesce with two sub-queries to pick the rate by rule A or rule B.

我无法测试这个,但我认为它会起作用。它使用带有两个子查询的合并来通过规则 A 或规则 B 选择速率。

Select t.Date, t.Amount, 
  ConvertedAmount = t.Amount/coalesce(    
    (Select Top 1 ex.Rate 
        From ExchangeRates ex 
        Where t.Date > ex.Date 
        Order by ex.Date desc )
     ,
     (select top 1 ex.Rate 
        From ExchangeRates  
        Order by ex.Date asc)
    ) 
From Transactions t

回答by Paul Creasey

SELECT 
    a.tranDate, 
    a.Amount,
    a.Amount/a.Rate as convertedRate
FROM
    (

    SELECT 
        t.date tranDate,
        e.date as rateDate,
        t.Amount,
        e.rate,
        RANK() OVER (Partition BY t.date ORDER BY
                         CASE WHEN DATEDIFF(day,e.date,t.date) < 0 THEN
                                   DATEDIFF(day,e.date,t.date) * -100000
                              ELSE DATEDIFF(day,e.date,t.date)
                         END ) AS diff
    FROM 
        ExchangeRates e
    CROSS JOIN 
        Transactions t
         ) a
WHERE a.diff = 1

The difference between tran and rate date is calculated, then negative values ( condition b) are multiplied by -10000 so that they can still be ranked but positive values (condition a always take priority. we then select the minimum date difference for each tran date using the rank over clause.

计算 tran 和 rate date 之间的差异,然后将负值(条件 b)乘以 -10000,以便它们仍然可以排名,但正值(条件 a 始终优先。然后我们选择每个 tran 日期的最小日期差异使用 rank over 子句。

回答by van

Many solutions will work. You should really find the one that works best (fastest) for your workload: do you search usually for one Transaction, list of them, all of them?

许多解决方案都会起作用。您真的应该找到最适合您的工作负载(最快)的一个:您通常搜索一个事务、它们的列表,还是所有这些?

The tie-breaker solution given your schema is:

给定您的架构的决胜局解决方案是:

SELECT      t.Date,
            t.Amount,
            r.Rate
            --//add your multiplication/division here

FROM        "Transactions" t

INNER JOIN  "ExchangeRates" r
        ON  r."ExchangeRateID" = (
                        SELECT TOP 1 x."ExchangeRateID"
                        FROM        "ExchangeRates" x
                        WHERE       x."SourceCurrencyISO" = t."SourceCurrencyISO" --//these are currency-related filters for your tables
                                AND x."TargetCurrencyISO" = t."TargetCurrencyISO" --//,which you should also JOIN on
                                AND x."Date" <= t."Date"
                        ORDER BY    x."Date" DESC)

You need to have the right indices for this query to be fast. Also ideally you should not have a JOINon "Date", but on "ID"-like field (INTEGER). Give me more schema info, I will create an example for you.

您需要有正确的索引才能使此查询更快。同样理想情况下,您不应该有JOINon "Date",而是 on类似"ID"的字段 ( INTEGER)。给我更多架构信息,我会为你创建一个例子。

回答by ErikE

There's nothing about a join that will be more elegant than the TOP 1correlated subquery in your original post. However, as you say, it doesn't satisfy requirement B.

没有什么比TOP 1原始帖子中的相关子查询更优雅的连接了。但是,正如您所说,它不满足要求 B。

These queries do work (SQL Server 2005 or later required). See the SqlFiddle for these.

这些查询确实有效(需要 SQL Server 2005 或更高版本)。有关这些,请参阅SqlFiddle

SELECT
   T.*,
   ExchangeRate = E.Rate
FROM
  dbo.Transactions T
  CROSS APPLY (
    SELECT TOP 1 Rate
    FROM dbo.ExchangeRate E
    WHERE E.RateDate <= T.TranDate
    ORDER BY
      CASE WHEN E.RateDate <= T.TranDate THEN 0 ELSE 1 END,
      E.RateDate DESC
  ) E;

Note that the CROSS APPLY with a single column value is functionally equivalent to the correlated subquery in the SELECTclause as you showed. I just prefer CROSS APPLY now because it is much more flexible and lets you reuse the value in multiple places, have multiple rows in it (for custom unpivoting) and lets you have multiple columns.

请注意,具有单个列值的 CROSS APPLY 在功能上等同于SELECT您展示的子句中的相关子查询。我现在更喜欢 CROSS APPLY,因为它更灵活,让您可以在多个位置重用该值,其中有多个行(用于自定义反透视)并允许您有多个列。

SELECT
   T.*,
   ExchangeRate = Coalesce(E.Rate, E2.Rate)
FROM
  dbo.Transactions T
  OUTER APPLY (
    SELECT TOP 1 Rate
    FROM dbo.ExchangeRate E
    WHERE E.RateDate <= T.TranDate
    ORDER BY E.RateDate DESC
  ) E
  OUTER APPLY (
    SELECT TOP 1 Rate
    FROM dbo.ExchangeRate E2
    WHERE E.Rate IS NULL
    ORDER BY E2.RateDate
  ) E2;

I don't know which one might perform better, or if either will perform better than other answers on the page. With a proper index on the Date columns, they should zing pretty well--definitely better than any Row_Number()solution.

我不知道哪个可能表现更好,或者哪个比页面上的其他答案表现更好。使用日期列上的适当索引,它们应该会非常好 - 绝对比任何Row_Number()解决方案都好。