SQL 如何有效地选择小于和大于给定值的最接近值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8938234/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 14:04:51  来源:igfitidea点击:

How can I select the nearest value less-than and greater-than a given value efficiently?

sqlsql-serveroptimizationquery-optimizationsql-server-2008-r2

提问by brianestey

I have two tables, one for values one for location and am trying to interpolate location. The tables have been simplified to the following:

我有两个表,一个用于值,一个用于位置,我正在尝试插入位置。这些表已简化为以下内容:

CREATE TABLE value(
    Timestamp DATETIME2,
    Value float NOT NULL,
    PRIMARY KEY(Timestamp)
);

CREATE TABLE location(
    Timestamp DATETIME2,
    Position INT NOT NULL,
    PRIMARY KEY(Timestamp)
); 

INSERT INTO value VALUES 
    ('2011/12/1 16:55:01', 1),
    ('2011/12/1 16:55:02', 5),
    ('2011/12/1 16:55:05', 10),
    ('2011/12/1 16:55:08', 6);

INSERT INTO location VALUES 
    ('2011/12/1 16:55:00', 0),
    ('2011/12/1 16:55:05', 10),
    ('2011/12/1 16:55:10', 5)

The expected results would be

预期的结果是

TimeStamp, Value, LowerTime, LowerLocation, UpperTime, UpperLocation
2011-12-01 16:55:01,  1, 2011-12-01 16:55:00,  0, 2011-12-01 16:55:05, 10
2011-12-01 16:55:02,  5, 2011-12-01 16:55:00,  0, 2011-12-01 16:55:05, 10
2011-12-01 16:55:05, 10, 2011-12-01 16:55:05, 10, 2011-12-01 16:55:05, 10
2011-12-01 16:55:08,  6, 2011-12-01 16:55:05, 10, 2011-12-01 16:55:10,  5

(Keep in mind this is simplified sample data to get the idea of the query I am trying to perform across.)

(请记住,这是简化的示例数据,以了解我正在尝试执行的查询。)

To do the interpolation, I need to figure out the time and locations before and after a given values time. I am currently doing this with a query that looks like:

要进行插值,我需要找出给定值时间前后的时间和位置。我目前正在使用如下查询执行此操作:

SELECT 
    V.Timestamp, 
    V.Value, 
    (SELECT MAX(Timestamp) FROM dbo.location WHERE Timestamp <= V.Timestamp) as LowerTime,
    (SELECT TOP 1 Position FROM dbo.location WHERE Timestamp <= V.Timestamp ORDER BY timestamp DESC) as LowerLocation,
    (SELECT MIN(Timestamp) FROM dbo.location WHERE Timestamp >= V.Timestamp) as UpperTime,
    (SELECT TOP 1 Position FROM dbo.location WHERE Timestamp >= V.Timestamp ORDER BY timestamp ASC) as UpperLocation
 FROM 
    dbo.value V 

Now this works, but this obviously is doing a lot of work. I'm thinking there must be a query simplification that I'm missing but I've been playing with it all morning and haven't come up with anything concrete. Hoping someone here has a better idea.

现在这可行,但这显然做了很多工作。我想一定有一个我遗漏的查询简化,但我整个上午都在玩它并且没有想出任何具体的东西。希望这里有人有更好的主意。

I am currently exploring if there is a way to figure out the LowerTime and UpperTime and use those in determining the Locations. Something like:

我目前正在探索是否有办法找出 LowerTime 和 UpperTime 并使用它们来确定位置。就像是:

SELECT 
    V.Timestamp, 
    V.Value, 
    (SELECT MAX(Timestamp) FROM dbo.location WHERE Timestamp <= V.Timestamp) as LowerTime,
    (SELECT Position FROM dbo.location WHERE Timestamp = LowerTime) as LowerLocation,
    (SELECT MIN(Timestamp) FROM dbo.location WHERE Timestamp >= V.Timestamp) as UpperTime,
    (SELECT Position FROM dbo.location WHERE Timestamp = UpperTime) as UpperLocation
 FROM 
    dbo.value V 

but this doesn't work.

但这不起作用。

EDIT1: Updated query as suggested. However no visible change in execution time.

EDIT1:按照建议更新查询。但是,执行时间没有明显变化。

EDIT2: Added my thoughts of the approach I am currently trying.

EDIT2:添加了我对我目前正在尝试的方法的想法。

回答by Sergey Kudriavtsev

For simplicity you may at least use MAX()and MIN()functions for querying timestampfield instead of TOP 1and ORDER BY.

为简单起见,您至少可以使用MAX()andMIN()函数来查询timestamp字段而不是TOP 1and ORDER BY

Full query will be

完整的查询将是

SELECT 
    V.Timestamp, 
    V.Value, 
    (SELECT MAX(Timestamp) FROM dbo.location WHERE Timestamp <= V.Timestamp) as LowerTime,
    (SELECT TOP 1 Position FROM dbo.location WHERE Timestamp <= V.Timestamp ORDER BY timestamp DESC) as LowerLocation,
    (SELECT MIN(Timestamp) FROM dbo.location WHERE Timestamp >= V.Timestamp) as UpperTime,
    (SELECT TOP 1 Position FROM dbo.location WHERE Timestamp >= V.Timestamp ORDER BY timestamp ASC) as UpperLocation
 FROM 
    dbo.value V 

回答by Damien_The_Unbeliever

This might do the trick (although I think the join looks quite ugly):

这可能会奏效(虽然我认为连接看起来很丑陋):

;with OrderedLocations as (
    select
        v.Timestamp,
        v.Value,
        l.Timestamp as tsl,
        l.Position,
        ROW_NUMBER() OVER (PARTITION BY v.Timestamp ORDER BY CASE WHEN l.Timestamp <= v.Timestamp THEN l.Timestamp ELSE '00010101' END desc) as PrevRN,
        ROW_NUMBER() OVER (PARTITION BY v.Timestamp ORDER BY CASE WHEN l.Timestamp >= v.Timestamp THEN l.Timestamp ELSE '99991231' END asc) as NextRN
    from
        value v
            cross join
        location l
)
select
    ol1.Timestamp,
    ol1.Value,
    ol1.tsl,
    ol1.Position,
    ol2.tsl,
    ol2.Position
from
    OrderedLocations ol1
        inner join
    OrderedLocations ol2
        on
            ol1.Timestamp = ol2.Timestamp and
            ol1.Value = ol2.Value
where
    ol1.PrevRN = 1 and
    ol2.NextRN = 1

Unfortunately, as with most efficiency/performance questions, the answer tends to be try lots of different combinations with your actualtables and data, and measure how each one performs.

不幸的是,与大多数效率/性能问题一样,答案往往是尝试使用实际表和数据进行多种不同组合,并衡量每个组合的表现。



An alternative (avoiding the join) using the same CTE as above would be:

使用与上述相同的 CTE 的替代方案(避免连接)是:

SELECT Timestamp,Value,
    MAX(CASE WHEN PrevRN=1 THEN tsl END),MAX(CASE WHEN PrevRN=1 then Position END),
    MAX(CASE WHEN NextRN=1 THEN tsl END),MAX(CASE WHEN NextRN=1 then Position END)
FROM
    OrderedLocations
where PrevRN=1 or NextRN=1
group by Timestamp,Value


The CTE (OrderedLocations) is trying to construct a rowset where every row from location is matched to every row in value. For each resulting row, we calculate two ROW_NUMBERs - the row number where we number all rows with a lower or equal timestamp (PrevRN) in descending order, and another where we number all rows with a greater or equal timestamp (NextRN) ascending. We then construct our final result by just considering those rows where one of those row numbers is 1.

CTE( OrderedLocations) 试图构造一个行集,其中 location 中的每一行都与value. 对于每个结果行,我们计算两个ROW_NUMBERs - 行号,其中我们PrevRN以降序对具有较低或相等时间戳 ( ) 的所有行进行编号,另一个是我们对具有大于或等于时间戳 ( NextRN) 的所有行进行升序编号。然后我们只考虑那些行号之一为 1 的行来构建我们的最终结果。