postgresql 有没有更好的方法来计算中位数(不是平均值)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3735252/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-20 00:17:39  来源:igfitidea点击:

Is there a better way to calculate the median (not average)

sqlpostgresqlaggregate-functions

提问by Ghislain Leveque

Suppose I have the following table definition:

假设我有以下表定义:

CREATE TABLE x (i serial primary key, value integer not null);

I want to calculate the MEDIAN of value(not the AVG). The median is a value that divides the set in two subsets containing the same number of elements. If the number of elements is even, the median is the average of the biggest value in the lowest segment and the lowest value of the biggest segment. (See wikipedia for more details.)

我想计算value(不是平均值)的中位数。中位数是将集合划分为包含相同数量元素的两个子集的值。如果元素个数为偶数,则中位数为最低段的最大值与最大段的最低值的平均值。(有关更多详细信息,请参阅维基百科。)

Here is how I manage to calculate the MEDIAN but I guess there must be a better way:

这是我设法计算 MEDIAN 的方法,但我想必须有更好的方法:

SELECT AVG(values_around_median) AS median
  FROM (
    SELECT
       DISTINCT(CASE WHEN FIRST_VALUE(above) OVER w2 THEN MIN(value) OVER w3 ELSE MAX(value) OVER w2 END)
        AS values_around_median
      FROM (
        SELECT LAST_VALUE(value) OVER w AS value,
               SUM(COUNT(*)) OVER w > (SELECT count(*)/2 FROM x) AS above
          FROM x
          GROUP BY value
          WINDOW w AS (ORDER BY value)
          ORDER BY value
        ) AS find_if_values_are_above_or_below_median
      WINDOW w2 AS (PARTITION BY above ORDER BY value DESC),
             w3 AS (PARTITION BY above ORDER BY value ASC)
    ) AS find_values_around_median

Any ideas?

有任何想法吗?

回答by Lukas Eder

Yes, with PostgreSQL 9.4, you can use the newly introduced inverse distribution function PERCENTILE_CONT(), an ordered-set aggregate function that is specified in the SQL standard as well.

是的,在 PostgreSQL 9.4 中,您可以使用新引入的逆分布函数PERCENTILE_CONT(),它也是 SQL 标准中指定的有序集聚合函数。

WITH t(value) AS (
  SELECT 1   UNION ALL
  SELECT 2   UNION ALL
  SELECT 100 
)
SELECT
  percentile_cont(0.5) WITHIN GROUP (ORDER BY value)
FROM
  t;

This emulation of MEDIAN()via PERCENTILE_CONT()is also documented here.

此处也记录了这种对MEDIAN()via 的模拟PERCENTILE_CONT()

回答by Scott Bailey

Indeed there IS an easier way. In Postgres you can define your own aggregate functions. I posted functions to do median as well as mode and range to the PostgreSQL snippets library a while back.

确实有更简单的方法。在 Postgres 中,您可以定义自己的聚合函数。不久前,我发布了对 PostgreSQL 片段库执行中位数以及模式和范围的函数。

http://wiki.postgresql.org/wiki/Aggregate_Median

http://wiki.postgresql.org/wiki/Aggregate_Median

回答by Erwin Brandstetter

A simpler query for that:

一个更简单的查询:

WITH y AS (
   SELECT value, row_number() OVER (ORDER BY value) AS rn
   FROM   x
   WHERE  value IS NOT NULL
   )
, c AS (SELECT count(*) AS ct FROM y) 
SELECT CASE WHEN c.ct%2 = 0 THEN
          round((SELECT avg(value) FROM y WHERE y.rn IN (c.ct/2, c.ct/2+1)), 3)
       ELSE
                (SELECT     value  FROM y WHERE y.rn = (c.ct+1)/2)
       END AS median
FROM   c;

Major points

要点

  • Ignores NULL values.
  • Core feature is the row_number() window function, which has been there since version 8.4
  • The final SELECT gets one row for uneven numbers and avg()of two rows for even numbers. Result is numeric, rounded to 3 decimal places.
  • 忽略 NULL 值。
  • 核心功能是row_number() 窗口函数,自 8.4 版以来一直存在
  • 最后的 SELECTavg()为奇数获取一行,为偶数获取两行。结果是数字,四舍五入到小数点后 3 位。

Test shows, that the new version is 4x faster than (and yields correct results, unlike) the query in the question:

测试表明,新版本比问题中的查询快 4 倍(并且产生正确的结果,不像):

CREATE TEMP TABLE x (value int);
INSERT INTO x SELECT generate_series(1,10000);
INSERT INTO x VALUES (NULL),(NULL),(NULL),(3);

回答by Chris B

For googlers: there is also http://pgxn.org/dist/quantileMedian can be calculated in one line after installation of this extension.

对于 googlers:还有http://pgxn.org/dist/quantile安装此扩展后可以在一行中计算中位数。

回答by Ghost

Simple sql with native postgres functions only:

仅具有本机 postgres 函数的简单 sql:

select 
    case count(*)%2
        when 1 then (array_agg(num order by num))[count(*)/2+1]
        else ((array_agg(num order by num))[count(*)/2]::double precision + (array_agg(num order by num))[count(*)/2+1])/2
    end as median
from unnest(array[5,17,83,27,28]) num;

Sure you can add coalesce() or something if you want to handle nulls.

当然,如果你想处理空值,你可以添加 coalesce() 或其他东西。

回答by Siddharth Tayade

CREATE TABLE array_table (id integer, values integer[]) ;

INSERT INTO array_table VALUES ( 1,'{1,2,3}');
INSERT INTO array_table VALUES ( 2,'{4,5,6,7}');

select id, values, cardinality(values) as array_length,
(case when cardinality(values)%2=0 and cardinality(values)>1 then (values[(cardinality(values)/2)]+ values[((cardinality(values)/2)+1)])/2::float 
 else values[(cardinality(values)+1)/2]::float end) as median  
 from array_table

Or you can create a function and use it any where in your further queries.

或者您可以创建一个函数并在进一步查询中的任何位置使用它。

CREATE OR REPLACE FUNCTION median (a integer[]) 
RETURNS float AS    $median$ 
Declare     
    abc float; 
BEGIN    
    SELECT (case when cardinality(a)%2=0 and cardinality(a)>1 then 
           (a[(cardinality(a)/2)] + a[((cardinality(a)/2)+1)])/2::float   
           else a[(cardinality(a)+1)/2]::float end) into abc;    
    RETURN abc; 
END;    
$median$ 
LANGUAGE plpgsql;

select id,values,median(values) from array_table

回答by Sowmiya Raja Radhakrishnan

Use the Below function for Finding nth percentile

使用下面的函数查找第 n 个百分位数

CREATE or REPLACE FUNCTION nth_percentil(anyarray, int)
    RETURNS 
        anyelement as 
    $$
        SELECT [/100.0 * array_upper(,1) + 1] ;
    $$ 
LANGUAGE SQL IMMUTABLE STRICT;

In Your case it's 50th Percentile.

在您的情况下,它是第 50 个百分位数。

Use the Below Query to get the Median

使用下面的查询来获取中位数

SELECT nth_percentil(ARRAY (SELECT Field_name FROM table_name ORDER BY 1),50)

This will give you 50th percentile which is the median basically.

这将为您提供第 50 个百分位数,这基本上是中位数。

Hope this is helpful.

希望这是有帮助的。