vb.net 四分位距 - 下限、上限和中位数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29945796/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-17 19:04:38  来源:igfitidea点击:

Interquartile Range - Lower, Upper and Median

sql-servervb.netexcel

提问by ca8msm

I'm trying to work out the interquartile range based on an array of numbers which can be any length e.g.

我正在尝试根据可以是任意长度的数字数组来计算四分位距,例如

1,  1,  5,  6,  7,  8,  2,  4,  7,  9,  9,  9,  9

The values that I need to work out from this interquartile range are:

我需要从这个四分位范围计算出的值是:

  • Upper Quartile
  • Median
  • Lower Quartile
  • 上四分位数
  • 中位数
  • 下四分位数

If I dump the above array of numbers into Microsoft Excel (columns A:M), then I can use the following formulas:

如果我将上述数字数组转储到 Microsoft Excel(A:M 列)中,那么我可以使用以下公式:

  • =QUARTILE.INC(A1:M1,1)
  • =QUARTILE.INC(A1:M1,2)
  • =QUARTILE.INC(A1:M1,3)
  • =QUARTILE.INC(A1:M1,1)
  • =QUARTILE.INC(A1:M1,2)
  • =QUARTILE.INC(A1:M1,3)

To get my answers of:

为了得到我的答案:

  • 4
  • 7
  • 9
  • 4
  • 7
  • 9

I now need to work out these 3 values in either SQL Server or VB.NET. I can get the array values in any format or object in either of these languages, but I can't find any functions that exist like the QUARTILE.INCfunction that Excel has.

我现在需要在 SQL Server 或 VB.NET 中计算出这 3 个值。我可以使用这些语言中的任何一种获取任何格式或对象的数组值,但我找不到任何存在的QUARTILE.INC函数,如 Excel 具有的函数。

Does anyone know how this could be achieved in either SQL Server or VB.NET?

有谁知道如何在 SQL Server 或 VB.NET 中实现这一点?

采纳答案by Tanner

There might be an easier way, but to get Quartiles, you can use NTILE (Transact-SQL)

可能有更简单的方法,但要获得四分位数,您可以使用NTILE (Transact-SQL)

Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.

将有序分区中的行分布到指定数量的组中。组被编号,从一开始。对于每一行,NTILE 返回该行所属组的编号。

So for your data:

所以对于你的数据:

SELECT  1 Val
INTO    #temp
UNION ALL
SELECT  1
UNION ALL
SELECT  5
UNION ALL
SELECT  6
UNION ALL
SELECT  7
UNION ALL
SELECT  8
UNION ALL
SELECT  2
UNION ALL
SELECT  4
UNION ALL
SELECT  7
UNION ALL
SELECT  9
UNION ALL
SELECT  9
UNION ALL
SELECT  9
UNION ALL
SELECT  9

-- NTILE(4) specifies you require 4 partitions (quartiles)
SELECT  NTILE(4) OVER ( ORDER BY Val ) AS Quartile ,
        Val
INTO #tempQuartiles
FROM    #temp

SELECT * 
FROM #tempQuartiles

DROP TABLE #temp
DROP TABLE #tempQuartiles

This would produce:

这将产生:

Quartile    Val
1           1
1           1
1           2
1           4
2           5
2           6
2           7
3           7
3           8
3           9
4           9
4           9
4           9

From this you can work out what you're after.

从中你可以找出你所追求的。

So modifying the SELECTyou can do this:

所以修改SELECT你可以这样做:

SELECT Quartile, MAX(Val) MaxVal
FROM #tempQuartiles
WHERE Quartile <= 3
GROUP BY Quartile

To produce:

生产:

Quartile    MaxVal
1           4
2           7
3           9

回答by Staeff

We have created a User-Defined-Type to use it as a function parameter and afterwards used it this way.

我们创建了一个 User-Defined-Type 将其用作函数参数,然后以这种方式使用它。

Our implementation uses the same calculations as the Excel Percentile function.

我们的实现使用与 Excel Percentile 函数相同的计算。

CREATE TYPE [dbo].[floatListType] AS TABLE (
    [value] FLOAT NOT NULL
);

GO

CREATE FUNCTION [dbo].[getPercentile]
(
    @data floatListType readonly,
    @percentile float
)
RETURNS float
AS
BEGIN
    declare @values table
    (
        value float,
        idx   int
    );

    insert into @values
    select value, ROW_NUMBER() OVER (order by value) - 1 as idx
    from @data;

    declare @cnt int = (select count(*) from @values)
        , @n float = (@cnt - 1) * @percentile + 1
        , @k int = FLOOR(@n)
        , @d float = @n - @k;

    if (@k = 0)
        return (select value from @values where idx = 0)
    if (@k = @cnt)
        return (select value from @values where idx = @cnt - 1)
    if (@k > 0 AND @k < @cnt)
        return (select value from @values where idx = @k - 1)
            + @d * ((select value from @values where idx = @k)
            - (select value from @values where idx = @k - 1))

    return null;
END

You can use it like this to get the median and the quartiles (as Q1 is simply a 0.25 percentile) for example:

您可以像这样使用它来获得中位数和四分位数(因为 Q1 只是一个 0.25 个百分位数),例如:

declare @values floatListType;

insert into @values
select value from #mytable

select getPercentile(@values, 0.25) as Q1,
    getPercentile(@values, 0.5) as median,
    getPercentile(@values, 0.75) as Q3

回答by SQLServerSteve

If you want a SQL Server solution, a couple of years ago I posted an Interquartile Range procedure on my blog. It's based on dynamic SQL, so you can plug any columns you have access to into it. It's not well-tested, I was still learning the ropes back then and the code is a little old now, but it could meet your needs out-of-the-box, or at least provide a starting point to code your own solution. Here's the gist of the code - follow the link to my blog for an in-depth discussion.

如果您需要 SQL Server 解决方案,几年前我在我的博客上发布了一个四分位距过程。它基于动态 SQL,因此您可以将您有权访问的任何列插入其中。它没有经过很好的测试,当时我还在学习绳索,现在代码有点旧,但它可以满足您开箱即用的需求,或者至少提供一个起点来编写您自己的解决方案。这是代码的要点 - 按照链接到我的博客进行深入讨论。

CREATE PROCEDURE [Calculations].[InterquartileRangeSP]
@DatabaseName as nvarchar(128) = NULL, @SchemaName as nvarchar(128), @TableName as nvarchar(128),@ColumnName AS nvarchar(128), @PrimaryKeyName as nvarchar(400), @OrderByCode as tinyint = 1, @DecimalPrecision AS nvarchar(50)
AS
SET @DatabaseName = @DatabaseName + ‘.'
DECLARE @SchemaAndTableName nvarchar(400)
SET @SchemaAndTableName = ISNull(@DatabaseName, ”) + @SchemaName + ‘.' + @TableName
DECLARE @SQLString nvarchar(max)

SET @SQLString = ‘DECLARE @OrderByCode tinyint,
@Count bigint,
@LowerPoint bigint,
@UpperPoint bigint,
@LowerRemainder decimal(38,37), — use the maximum precision and scale for these two variables to make the
 procedure flexible enough to handle large datasets; I suppose I could use a float
@UpperRemainder decimal(38,37),
@LowerQuartile decimal(‘ + @DecimalPrecision + ‘),
@UpperQuartile decimal(‘ + @DecimalPrecision + ‘),
@InterquartileRange decimal(‘ + @DecimalPrecision + ‘),
@LowerInnerFence decimal(‘ + @DecimalPrecision + ‘),
@UpperInnerFence decimal(‘ + @DecimalPrecision + ‘),
@LowerOuterFence decimal(‘ + @DecimalPrecision + ‘),
@UpperOuterFence decimal(‘ + @DecimalPrecision + ‘) 

SET @OrderByCode = ‘ + CAST(@OrderByCode AS nvarchar(50)) + ‘ SELECT @Count=Count(‘ + @ColumnName + ‘)
FROM ‘ + @SchemaAndTableName +
‘ WHERE ‘ + @ColumnName + ‘ IS NOT NULL

SELECT @LowerPoint = (@Count + 1) / 4, @LowerRemainder =  ((CAST(@Count AS decimal(‘ + @DecimalPrecision + ‘)) + 1) % 4) /4,
@UpperPoint = ((@Count + 1) *3) / 4, @UpperRemainder =  (((CAST(@Count AS decimal(‘ + @DecimalPrecision + ‘)) + 1) *3) % 4) / 4; –multiply by 3 for the left s' + @PrimaryKeyName + ‘e on the upper point to get 75 percent

WITH TempCTE
(‘ + @PrimaryKeyName + ‘, RN, ‘ + @ColumnName + ‘)
AS (SELECT ‘ + @PrimaryKeyName + ‘, ROW_NUMBER() OVER (PARTITION BY 1 ORDER BY ‘ + @ColumnName + ‘ ASC) AS RN, ‘ + @ColumnName + ‘
FROM ‘ + @SchemaAndTableName + ‘
WHERE ‘ + @ColumnName + ‘ IS NOT NULL),
TempCTE2 (QuartileValue)
AS (SELECT TOP 1 ‘ + @ColumnName + ‘ + ((Lead(‘ + @ColumnName + ‘, 1) OVER (ORDER BY ‘ + @ColumnName + ‘) – ‘ + @ColumnName + ‘) * @LowerRemainder) AS QuartileValue
FROM TempCTE
WHERE RN BETWEEN @LowerPoint AND @LowerPoint + 1 

UNION


SELECT TOP 1 ‘ + @ColumnName + ‘ + ((Lead(‘ + @ColumnName + ‘, 1) OVER (ORDER BY ‘ + @ColumnName + ‘) – ‘ + @ColumnName + ‘) * @UpperRemainder) AS QuartileValue
FROM TempCTE
WHERE RN BETWEEN @UpperPoint AND @UpperPoint + 1)

SELECT @LowerQuartile = (SELECT TOP 1 QuartileValue
 FROM TempCTE2 ORDER BY QuartileValue ASC), @UpperQuartile = (SELECT TOP 1 QuartileValue
 FROM TempCTE2 ORDER BY QuartileValue DESC)

SELECT @InterquartileRange = @UpperQuartile – @LowerQuartile
SELECT @LowerInnerFence = @LowerQuartile – (1.5 * @InterquartileRange), @UpperInnerFence = @UpperQuartile + (1.5 * @InterquartileRange), @LowerOuterFence = @LowerQuartile – (3 * @InterquartileRange), @UpperOuterFence = @UpperQuartile + (3 * @InterquartileRange)

–SELECT @LowerPoint AS LowerPoint, @LowerRemainder AS LowerRemainder, @UpperPoint AS UpperPoint, @UpperRemainder AS UpperRemainder
— uncomment this line to debug the inner calculations

SELECT @LowerQuartile AS LowerQuartile, @UpperQuartile AS UpperQuartile, @InterquartileRange AS InterQuartileRange,@LowerInnerFence AS LowerInnerFence, @UpperInnerFence AS UpperInnerFence,@LowerOuterFence AS LowerOuterFence, @UpperOuterFence AS UpperOuterFence


SELECT ‘ + @PrimaryKeyName + ‘, ‘ + @ColumnName + ‘, OutlierDegree
FROM  (SELECT ‘ + @PrimaryKeyName + ‘, ‘ + @ColumnName + ‘,
       ”OutlierDegree” =  CASE WHEN (‘ + @ColumnName + ‘ < @LowerInnerFence AND ‘ + @ColumnName + ‘ >= @LowerOuterFence) OR (‘ +
@ColumnName + ‘ > @UpperInnerFence
 AND ‘ + @ColumnName + ‘ <= @UpperOuterFence) THEN 1
       WHEN ‘ + @ColumnName + ‘ < @LowerOuterFence OR ‘ + @ColumnName + ‘ > @UpperOuterFence THEN 2
       ELSE 0 END
       FROM ‘ + @SchemaAndTableName + ‘
       WHERE ‘ + @ColumnName + ‘ IS NOT NULL) AS T1
      ORDER BY CASE WHEN @OrderByCode = 1 THEN ‘ + @PrimaryKeyName + ‘ END ASC,
CASE WHEN @OrderByCode = 2 THEN ‘ + @PrimaryKeyName + ‘ END DESC,
CASE WHEN @OrderByCode = 3 THEN ‘ + @ColumnName + ‘ END ASC,
CASE WHEN @OrderByCode = 4 THEN ‘ + @ColumnName + ‘ END DESC,
CASE WHEN @OrderByCode = 5 THEN OutlierDegree END ASC,
CASE WHEN @OrderByCode = 6 THEN OutlierDegree END DESC‘

–SELECT @SQLString — uncomment this to debug string errors
EXEC (@SQLString)

回答by Evaldas Buinauskas

Apologies if I misunderstood you, but this could be done using NTILE()and later on ROW_NUMBER()

如果我误解了你,抱歉,但这可以使用NTILE()和稍后完成ROW_NUMBER()

SQL Code:

SQL 代码:

;WITH FirstStep (NT, N)
AS (
    SELECT NTILE(3) OVER (ORDER BY T.column1), T.column1
    FROM dbo.GetTableFromList_Int('1,  1,  5,  6,  7,  8,  2,  4,  7,  9,  9,  9,  9', ',') AS T
),
SecondStep (RN, NT, N)
AS (
    SELECT ROW_NUMBER() OVER (PARTITION BY T.NT ORDER BY T.N DESC), NT, T.N
    FROM FirstStep AS T
)
SELECT N
FROM SecondStep
WHERE RN = 1

Explanation:

解释:

  • dbo.GetTableFromList_Int() TVF splits my string into rows (DISTINCT ones)
  • We use NTILE(3)to split this into three categories, ordered by your list (IIRC you need to order your list to get correct values)
  • Then use ROW_NUMBER()to get correct value in each of the group.
  • dbo.GetTableFromList_Int() TVF 将我的字符串拆分为行(DISTINCT 行)
  • 我们NTILE(3)习惯将其分为三类,按您的列表排序(IIRC 您需要对列表进行排序以获得正确的值)
  • 然后用于ROW_NUMBER()在每个组中获得正确的值。

In your scenario it returns expected results.

在您的场景中,它返回预期结果。

If this is not what you needed, then it can be modified to get correct output.

如果这不是您所需要的,则可以对其进行修改以获得正确的输出。