T-SQL 中的加权平均值(如 Excel 的 SUMPRODUCT)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1698408/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 04:19:21  来源:igfitidea点击:

Weighted average in T-SQL (like Excel's SUMPRODUCT)

sqlsql-servertsqlstatistics

提问by ProfK

I am looking for a way to derive a weighted average from two rows of data with the same number of columns, where the average is as follows (borrowing Excel notation):

我正在寻找一种从具有相同列数的两行数据中得出加权平均值的方法,其中平均值如下(借用 Excel 表示法):

(A1*B1)+(A2*B2)+...+(An*Bn)/SUM(A1:An)

The first part reflects the same functionality as Excel's SUMPRODUCT() function.

第一部分反映了与 Excel 的 SUMPRODUCT() 函数相同的功能。

My catch is that I need to dynamically specify which row gets averaged with weights, and which row the weights come from, and a date range.

我的问题是我需要动态指定哪一行被加权平均,权重来自哪一行,以及一个日期范围。

EDIT: This is easier than I thought, because Excel was making me think I required some kind of pivot. My solution so far is thus:

编辑:这比我想象的要容易,因为 Excel 让我觉得我需要某种支点。到目前为止,我的解决方案是:

select sum(baseSeries.Actual * weightSeries.Actual) / sum(weightSeries.Actual)
from (
    select RecordDate , Actual 
    from CalcProductionRecords 
    where KPI = 'Weighty'
) baseSeries inner join (       
    select RecordDate , Actual 
    from CalcProductionRecords 
    where KPI = 'Tons Milled'   
) weightSeries on baseSeries.RecordDate = weightSeries.RecordDate

回答by MatBailie

Quassnoi's answer shows how to do the SumProduct, and using a WHERE clause would allow you to restrict by a Date field...

Quassnoi 的回答显示了如何执行 SumProduct,并且使用 WHERE 子句将允许您通过日期字段进行限制...

SELECT
   SUM([tbl].data * [tbl].weight) / SUM([tbl].weight)
FROM
   [tbl]
WHERE
   [tbl].date >= '2009 Jan 01'
   AND [tbl].date < '2010 Jan 01'

The more complex part is where you want to "dynamically specify" the what field is [data] and what field is [weight]. The short answer is that realistically you'd have to make use of Dynamic SQL. Something along the lines of:
- Create a string template
- Replace all instances of [tbl].data with the appropriate data field
- Replace all instances of [tbl].weight with the appropriate weight field
- Execute the string

更复杂的部分是您要“动态指定”什么字段是 [data],什么字段是 [weight]。简短的回答是,实际上您必须使用动态 SQL。类似于以下内容:
- 创建字符串模板
- 用适当的数据字段
替换[tbl].data 的所有实例 - 用适当的权重字段替换 [tbl].weight 的所有实例
- 执行字符串

Dynamic SQL, however, carries it's own overhead. Is the queries are relatively infrequent , or the execution time of the query itself is relatively long, this may not matter. If they are common and short, however, you may notice that using dynamic sql introduces a noticable overhead. (Not to mention being careful of SQL injection attacks, etc.)

但是,动态 SQL 有其自身的开销。是查询比较少,还是查询本身的执行时间比较长,这个可能没有关系。但是,如果它们很常见且很短,您可能会注意到使用动态 sql 会带来显着的开销。(更不用说小心 SQL 注入攻击等)

EDIT:

编辑:

In your lastest example you highlight three fields:

在您的最新示例中,您突出显示了三个字段:

  • RecordDate
  • KPI
  • Actual
  • 记录日期
  • 关键绩效指标
  • 实际的

When the [KPI] is "Weight Y", then [Actual] the Weighting Factor to use.
When the [KPI] is "Tons Milled", then [Actual] is the Data you want to aggregate.

当 [KPI] 为“Weight Y”时,则 [Actual] 要使用的权重系数。
当 [KPI] 为“Tons Milled”时,[Actual] 是您要聚合的数据。



Some questions I have are:

我的一些问题是:

  • Are there any other fields?
  • Is there only ever ONE actual per date per KPI?
  • 还有其他领域吗?
  • 每个 KPI 每个日期只有一个实际值吗?

The reason I ask being that you want to ensure the JOIN you do is only ever 1:1. (You don't want 5 Actuals joining with 5 Weights, giving 25 resultsing records)

我问的原因是你想确保你所做的 JOIN 只是 1:1。(您不希望 5 个 Actuals 加入 5 个权重,从而提供 25 个结果记录)

Regardless, a slight simplification of your query is certainly possible...

无论如何,您的查询的轻微简化当然是可能的......

SELECT
   SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
   CalcProductionRecords AS [baseSeries]
INNER JOIN
   CalcProductionRecords AS [weightSeries]
      ON [weightSeries].RecordDate = [baseSeries].RecordDate
--    AND [weightSeries].someOtherID = [baseSeries].someOtherID
WHERE
   [baseSeries].KPI = 'Tons Milled'
   AND [weightSeries].KPI = 'Weighty'

The commented out line only needed if you need additional predicates to ensure a 1:1 relationship between your data and the weights.

注释掉的行仅在您需要额外的谓词以确保数据和权重之间的 1:1 关系时才需要。



If you can't guarnatee just One value per date, and don't have any other fields to join on, you can modify your sub_query based version slightly...

如果您不能保证每个日期只有一个值,并且没有任何其他字段可以加入,则可以稍微修改基于 sub_query 的版本...

SELECT
   SUM([baseSeries].Actual * [weightSeries].Actual) / SUM([weightSeries].Actual)
FROM
(
    SELECT
        RecordDate,
        SUM(Actual)
    FROM
        CalcProductionRecords
    WHERE
        KPI = 'Tons Milled'
    GROUP BY
        RecordDate
)
   AS [baseSeries]
INNER JOIN
(
    SELECT
        RecordDate,
        AVG(Actual)
    FROM
        CalcProductionRecords
    WHERE
        KPI = 'Weighty'
    GROUP BY
        RecordDate
)
   AS [weightSeries]
      ON [weightSeries].RecordDate = [baseSeries].RecordDate

This assumes the AVG of the weight is valid if there are multiple weights for the same day.

如果同一天有多个权重,则假设权重的 AVG 有效。



EDIT :Someone just voted for this so I thought I'd improve the final answer :)

编辑:有人刚刚投了这个票,所以我想我会改进最终答案:)

SELECT
   SUM(Actual * Weight) / SUM(Weight)
FROM
(
    SELECT
        RecordDate,
        SUM(CASE WHEN KPI = 'Tons Milled' THEN Actual ELSE NULL END)   AS Actual,
        AVG(CASE WHEN KPI = 'Weighty'     THEN Actual ELSE NULL END)   AS Weight
    FROM
        CalcProductionRecords
    WHERE
        KPI IN ('Tons Milled', 'Weighty')
    GROUP BY
        RecordDate
)
   AS pivotAggregate

This avoids the JOIN and also only scans the table once.

这避免了 JOIN 并且也只扫描表一次。

It relies on the fact that NULLvalues are ignored when calculating the AVG().

它依赖于NULL在计算AVG().

回答by Quassnoi

SELECT  SUM(A * B) / SUM(A)
FROM    mytable

回答by priyanka.sarkar

If I have understand the problem then try this

如果我理解了这个问题,那么试试这个

SET DATEFORMAT dmy
    declare @tbl table(A int, B int,recorddate datetime,KPI varchar(50))
    insert into @tbl 
        select 1,10 ,'21/01/2009', 'Weighty'union all 
        select 2,20,'10/01/2009', 'Tons Milled' union all
        select 3,30 ,'03/02/2009', 'xyz'union all 
        select 4,40 ,'10/01/2009', 'Weighty'union all
        select 5,50 ,'05/01/2009', 'Tons Milled'union all 
        select 6,60,'04/01/2009', 'abc' union all
        select 7,70 ,'05/01/2009', 'Weighty'union all 
        select 8,80,'09/01/2009', 'xyz' union all
        select 9,90 ,'05/01/2009', 'kws'    union all 
        select 10,100,'05/01/2009', 'Tons Milled'

    select SUM(t1.A*t2.A)/SUM(t2.A)Result  from  
                   (select RecordDate,A,B,KPI from @tbl)t1 
        inner join(select RecordDate,A,B,KPI from @tbl t)t2
        on t1.RecordDate = t2.RecordDate
        and t1.KPI = t2.KPI