使用 SQL 查询识别趋势
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20883195/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Identifying trend with SQL query
提问by Dan Markhasin
I have a table (let's call it Data) with a set of object IDs, numeric values and dates. I would like to identify the objects whose values had a positive trend over the last X minutes (say, an hour).
我有一个表(我们称之为数据),其中包含一组对象 ID、数值和日期。我想确定其值在过去 X 分钟(例如一小时)内呈正趋势的对象。
Example data:
示例数据:
entity_id | value | date
1234 | 15 | 2014-01-02 11:30:00
5689 | 21 | 2014-01-02 11:31:00
1234 | 16 | 2014-01-02 11:31:00
I tried looking at similar questions, but didnt find anything that helps unfortunately...
我尝试查看类似的问题,但不幸的是没有找到任何帮助...
回答by John Chrysostom
You inspired me to go and implement linear regression in SQL Server. This could be modified for MySQL/Oracle/Whatever without too much trouble. It's the mathematically best way of determining the trend over the hour for each entity_id and it will select out only the ones with a positive trend.
你激励我去在 SQL Server 中实现线性回归。这可以针对 MySQL/Oracle/Whatever 进行修改,不会有太多麻烦。这是确定每个 entity_id 一小时内趋势的数学最佳方法,它只会选择具有正趋势的那些。
It implements the formula for calculating B1hat listed here: https://en.wikipedia.org/wiki/Regression_analysis#Linear_regression
它实现了此处列出的计算 B1hat 的公式:https://en.wikipedia.org/wiki/Regression_analysis#Linear_regression
create table #temp
(
entity_id int,
value int,
[date] datetime
)
insert into #temp (entity_id, value, [date])
values
(1,10,'20140102 07:00:00 AM'),
(1,20,'20140102 07:15:00 AM'),
(1,30,'20140102 07:30:00 AM'),
(2,50,'20140102 07:00:00 AM'),
(2,20,'20140102 07:47:00 AM'),
(3,40,'20140102 07:00:00 AM'),
(3,40,'20140102 07:52:00 AM')
select entity_id, 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar)) as Beta
from
(
select entity_id,
avg(value) over(partition by entity_id) as ybar,
value as y,
avg(datediff(second,'20140102 07:00:00 AM',[date])) over(partition by entity_id) as xbar,
datediff(second,'20140102 07:00:00 AM',[date]) as x
from #temp
where [date]>='20140102 07:00:00 AM' and [date]<'20140102 08:00:00 AM'
) as Calcs
group by entity_id
having 1.0*sum((x-xbar)*(y-ybar))/sum((x-xbar)*(x-xbar))>0