T-SQL 中的加权平均值(如 Excel 的 SUMPRODUCT)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1698408/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Weighted average in T-SQL (like Excel's SUMPRODUCT)
提问by ProfK
I am looking for a way to derive a weighted average from two rows of data with the same number of columns, where the average is as follows (borrowing Excel notation):
我正在寻找一种从具有相同列数的两行数据中得出加权平均值的方法,其中平均值如下(借用 Excel 表示法):
(A1*B1)+(A2*B2)+...+(An*Bn)/SUM(A1:An)
The first part reflects the same functionality as Excel's SUMPRODUCT() function.
第一部分反映了与 Excel 的 SUMPRODUCT() 函数相同的功能。
My catch is that I need to dynamically specify which row gets averaged with weights, and which row the weights come from, and a date range.
我的问题是我需要动态指定哪一行被加权平均,权重来自哪一行,以及一个日期范围。
EDIT: This is easier than I thought, because Excel was making me think I required some kind of pivot. My solution so far is thus:
编辑:这比我想象的要容易,因为 Excel 让我觉得我需要某种支点。到目前为止,我的解决方案是:
select sum(baseSeries.Actual * weightSeries.Actual) / sum(weightSeries.Actual)
from (
select RecordDate , Actual
from CalcProductionRecords
where KPI = 'Weighty'
) baseSeries inner join (
select RecordDate , Actual
from CalcProductionRecords
where KPI = 'Tons Milled'
) weightSeries on baseSeries.RecordDate = weightSeries.RecordDate
回答by MatBailie
Quassnoi's answer shows how to do the SumProduct, and using a WHERE clause would allow you to restrict by a Date field...
Quassnoi 的回答显示了如何执行 SumProduct,并且使用 WHERE 子句将允许您通过日期字段进行限制...
SELECT
SUM([tbl].data * [tbl].weight) / SUM([tbl].weight)
FROM
[tbl]
WHERE
[tbl].date >= '2009 Jan 01'
AND [tbl].date < '2010 Jan 01'
The more complex part is where you want to "dynamically specify" the what field is [data] and what field is [weight]. The short answer is that realistically you'd have to make use of Dynamic SQL. Something along the lines of:
- Create a string template
- Replace all instances of [tbl].data with the appropriate data field
- Replace all instances of [tbl].weight with the appropriate weight field
- Execute the string
更复杂的部分是您要“动态指定”什么字段是 [data],什么字段是 [weight]。简短的回答是,实际上您必须使用动态 SQL。类似于以下内容:
- 创建字符串模板
- 用适当的数据字段
替换[tbl].data 的所有实例 - 用适当的权重字段替换 [tbl].weight 的所有实例
- 执行字符串
Dynamic SQL, however, carries it's own overhead. Is the queries are relatively infrequent , or the execution time of the query itself is relatively long, this may not matter. If they are common and short, however, you may notice that using dynamic sql introduces a noticable overhead. (Not to mention being careful of SQL injection attacks, etc.)
但是,动态 SQL 有其自身的开销。是查询比较少,还是查询本身的执行时间比较长,这个可能没有关系。但是,如果它们很常见且很短,您可能会注意到使用动态 sql 会带来显着的开销。(更不用说小心 SQL 注入攻击等)
EDIT:
编辑:
In your lastest example you highlight three fields:
在您的最新示例中,您突出显示了三个字段:
- RecordDate
- KPI
- Actual
- 记录日期
- 关键绩效指标
- 实际的
When the [KPI] is "Weight Y", then [Actual] the Weighting Factor to use.
When the [KPI] is "Tons Milled", then [Actual] is the Data you want to aggregate.
当 [KPI] 为“Weight Y”时,则 [Actual] 要使用的权重系数。
当 [KPI] 为“Tons Milled”时,[Actual] 是您要聚合的数据。
Some questions I have are:
我的一些问题是:
- Are there any other fields?
- Is there only ever ONE actual per date per KPI?
- 还有其他领域吗?
- 每个 KPI 每个日期只有一个实际值吗?
The reason I ask being that you want to ensure the JOIN you do is only ever 1:1. (You don't want 5 Actuals joining with 5 Weights, giving 25 resultsing records)
我问的原因是你想确保你所做的 JOIN 只是 1:1。(您不希望 5 个 Actuals 加入 5 个权重,从而提供 25 个结果记录)
Regardless, a slight simplification of your query is certainly possible...
无论如何,您的查询的轻微简化当然是可能的......
SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
CalcProductionRecords AS [baseSeries]
INNER JOIN
CalcProductionRecords AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate
-- AND [weightSeries].someOtherID = [baseSeries].someOtherID
WHERE
[baseSeries].KPI = 'Tons Milled'
AND [weightSeries].KPI = 'Weighty'
The commented out line only needed if you need additional predicates to ensure a 1:1 relationship between your data and the weights.
注释掉的行仅在您需要额外的谓词以确保数据和权重之间的 1:1 关系时才需要。
If you can't guarnatee just One value per date, and don't have any other fields to join on, you can modify your sub_query based version slightly...
如果您不能保证每个日期只有一个值,并且没有任何其他字段可以加入,则可以稍微修改基于 sub_query 的版本...
SELECT
SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
(
SELECT
RecordDate,
SUM(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Tons Milled'
GROUP BY
RecordDate
)
AS [baseSeries]
INNER JOIN
(
SELECT
RecordDate,
AVG(Actual)
FROM
CalcProductionRecords
WHERE
KPI = 'Weighty'
GROUP BY
RecordDate
)
AS [weightSeries]
ON [weightSeries].RecordDate = [baseSeries].RecordDate
This assumes the AVG of the weight is valid if there are multiple weights for the same day.
如果同一天有多个权重,则假设权重的 AVG 有效。
EDIT :Someone just voted for this so I thought I'd improve the final answer :)
编辑:有人刚刚投了这个票,所以我想我会改进最终答案:)
SELECT
SUM(Actual * Weight) / SUM(Weight)
FROM
(
SELECT
RecordDate,
SUM(CASE WHEN KPI = 'Tons Milled' THEN Actual ELSE NULL END) AS Actual,
AVG(CASE WHEN KPI = 'Weighty' THEN Actual ELSE NULL END) AS Weight
FROM
CalcProductionRecords
WHERE
KPI IN ('Tons Milled', 'Weighty')
GROUP BY
RecordDate
)
AS pivotAggregate
This avoids the JOIN and also only scans the table once.
这避免了 JOIN 并且也只扫描表一次。
It relies on the fact that NULL
values are ignored when calculating the AVG()
.
它依赖于NULL
在计算AVG()
.
回答by Quassnoi
SELECT SUM(A * B) / SUM(A)
FROM mytable
回答by priyanka.sarkar
If I have understand the problem then try this
如果我理解了这个问题,那么试试这个
SET DATEFORMAT dmy
declare @tbl table(A int, B int,recorddate datetime,KPI varchar(50))
insert into @tbl
select 1,10 ,'21/01/2009', 'Weighty'union all
select 2,20,'10/01/2009', 'Tons Milled' union all
select 3,30 ,'03/02/2009', 'xyz'union all
select 4,40 ,'10/01/2009', 'Weighty'union all
select 5,50 ,'05/01/2009', 'Tons Milled'union all
select 6,60,'04/01/2009', 'abc' union all
select 7,70 ,'05/01/2009', 'Weighty'union all
select 8,80,'09/01/2009', 'xyz' union all
select 9,90 ,'05/01/2009', 'kws' union all
select 10,100,'05/01/2009', 'Tons Milled'
select SUM(t1.A*t2.A)/SUM(t2.A)Result from
(select RecordDate,A,B,KPI from @tbl)t1
inner join(select RecordDate,A,B,KPI from @tbl t)t2
on t1.RecordDate = t2.RecordDate
and t1.KPI = t2.KPI