postgresql 仅使用最小的 COUNT() 转置行和列(又名枢轴)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13168066/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 00:30:43  来源:igfitidea点击:

Transpose rows and columns (a.k.a. pivot) only with a minimum COUNT()?

sqlpostgresqlpivotcrosstab

提问by user1626730

Here's my table 'tab_test':

这是我的表“tab_test”:

year    animal  price
2000    kittens 79
2000    kittens 93
2000    kittens 100
2000    puppies 15
2000    puppies 32
2001    kittens 31
2001    kittens 17
2001    puppies 65
2001    puppies 48
2002    kittens 84
2002    kittens 86
2002    puppies 15
2002    puppies 95
2003    kittens 62
2003    kittens 24
2003    puppies 36
2003    puppies 41
2004    kittens 65
2004    kittens 85
2004    puppies 58
2004    puppies 95
2005    kittens 45
2005    kittens 25
2005    puppies 15
2005    puppies 35
2006    kittens 50
2006    kittens 80
2006    puppies 95
2006    puppies 49
2007    kittens 40
2007    kittens 19
2007    puppies 81
2007    puppies 38
2008    kittens 37
2008    kittens 51
2008    puppies 29
2008    puppies 72
2009    kittens 84
2009    kittens 26
2009    puppies 49
2009    puppies 34
2010    kittens 75
2010    kittens 96
2010    puppies 18
2010    puppies 26
2011    kittens 35
2011    kittens 21
2011    puppies 90
2011    puppies 18
2012    kittens 12
2012    kittens 23
2012    puppies 74
2012    puppies 79

Here's some code that transposes the rows and columns so I get an average for 'kittens' and 'puppies':

这是一些转置行和列的代码,因此我得到了“小猫”和“小狗”的平均值:

SELECT
    year,
    AVG(CASE WHEN animal = 'kittens' THEN price END) AS "kittens",
    AVG(CASE WHEN animal = 'puppies' THEN price END) AS "puppies"
FROM tab_test
GROUP BY year
ORDER BY year;

The output for the code above is:

上面代码的输出是:

    year    kittens puppies
    2000    90.6666666666667    23.5
    2001    24.0    56.5
    2002    85.0    55.0
    2003    43.0    38.5
    2004    75.0    76.5
    2005    35.0    25.0
    2006    65.0    72.0
    2007    29.5    59.5
    2008    44.0    50.5
    2009    55.0    41.5
    2010    85.5    22.0
    2011    28.0    54.0
    2012    17.5    76.5

What I'd like is a table like the second one, but it would only contain items which had a COUNT()of at least 3 in the first table. In other words, the goal is to have thisas output:

我想要的是像第二个表一样的表,但它只包含COUNT()第一个表中a至少为 3 的项目。换句话说,目标是将作为输出:

year    kittens
2000    90.6666666666667

There were at least 3 instances of 'kitten' in the first table.
Is this possible in PostgreSQL?

第一个表中至少有 3 个 'kitten' 实例。
这在 PostgreSQL 中可能吗?

采纳答案by Andriy M

Here's an alternative to @bluefeet's suggestion, which is somewhat similar but avoids the join (instead, the upper level grouping is applied to the already grouped result set):

这是@bluefeet 建议的替代方案,它有点相似,但避免了连接(相反,上层分组应用于已经分组的结果集):

SELECT
  year,
  MAX(CASE animal WHEN 'kittens' THEN avg_price END) AS "kittens",
  MAX(CASE animal WHEN 'puppies' THEN avg_price END) AS "puppies"
FROM (
  SELECT
    animal,
    year,
    COUNT(*) AS cnt,
    AVG(Price) AS avg_price
  FROM tab_test
  GROUP BY
    animal,
    year
) s
WHERE cnt >= 3
GROUP BY
  year
;

回答by Erwin Brandstetter

CASE

CASE

If your case is as simple as demonstrated, a CASEstatement will do:

如果您的案例像演示的一样简单,则CASE声明将执行以下操作:

SELECT year
     , sum(CASE WHEN animal = 'kittens' THEN price END) AS kittens
     , sum(CASE WHEN animal = 'puppies' THEN price END) AS puppies
FROM  (
   SELECT year, animal, avg(price) AS price
   FROM   tab_test
   GROUP  BY year, animal
   HAVING count(*) > 2
   ) t
GROUP  BY year
ORDER  BY year;

Doesn't matter whether you use sum(), max()or min()as aggregate function in the outer query. They all result in the same value in this case.

在外部查询中使用sum(),max()min()作为聚合函数都没有关系。在这种情况下,它们都产生相同的值。

SQL Fiddle

SQL小提琴

crosstab()

crosstab()

With more categories it will be simpler with a crosstab()query. This should also be faster for bigger tables.

有了更多类别,crosstab()查询会更简单。对于更大的表,这也应该更快

You need to install the additional module tablefunc(once per database). Since Postgres 9.1 that's as simple as:

您需要安装附加模块tablefunc(每个数据库一次)。从 Postgres 9.1 开始,这很简单:

CREATE EXTENSION tablefunc;

Details in this related answer:

此相关答案中的详细信息:

SELECT * FROM crosstab(
      'SELECT year, animal, avg(price) AS price
       FROM   tab_test
       GROUP  BY animal, year
       HAVING count(*) > 2
       ORDER  BY 1,2'

      ,$$VALUES ('kittens'::text), ('puppies')$$)
AS ct ("year" text, "kittens" numeric, "puppies" numeric);

No sqlfiddle for this one because the site doesn't allow additional modules.

这个没有 sqlfiddle,因为该站点不允许附加模块。

Benchmark

基准

To verify my claims I ran a quick benchmark with close to real data in my small test database. PostgreSQL 9.1.6. Test with EXPLAIN ANALYZE, best of 10:

为了验证我的说法,我在我的小型测试数据库中使用接近真实数据的快速基准测试。PostgreSQL 9.1.6。测试EXPLAIN ANALYZE,最好的 10:

Test setup with 10020 rows:

具有 10020 行的测试设置:

CREATE TABLE tab_test (year int, animal text, price numeric);

-- years with lots of rows
INSERT INTO tab_test
SELECT 2000 + ((g + random() * 300))::int/1000 
     , CASE WHEN (g + (random() * 1.5)::int) %2 = 0 THEN 'kittens' ELSE 'puppies' END
     , (random() * 200)::numeric
FROM   generate_series(1,10000) g;

-- .. and some years with only few rows to include cases with count < 3
INSERT INTO tab_test
SELECT 2010 + ((g + random() * 10))::int/2
     , CASE WHEN (g + (random() * 1.5)::int) %2 = 0 THEN 'kittens' ELSE 'puppies' END
     , (random() * 200)::numeric
FROM   generate_series(1,20) g;

Results:

结果:

@bluefeet
Total runtime: 95.401 ms

@bluefeet
总运行时间:95.401 毫秒

@wildplasser(different results, includes rows with count <= 3)
Total runtime: 64.497 ms

@wildplasser(不同的结果,包括带有 的行count <= 3
总运行时间:64.497 毫秒

@Andreiy(+ ORDER BY)
& @Erwin1 - CASE(both perform about the same)
Total runtime: 39.105 ms

@Andreiy(+ ORDER BY)
& @Erwin1 - CASE(两者的表现大致相同)
总运行时间:39.105 毫秒

@Erwin2 - crosstab()
Total runtime: 17.644 ms

@Erwin2 -crosstab()
总运行时间:17.644 毫秒

Largely proportional (but irrelevant) results with only 20 rows. Only @wildplasser's CTE has more overhead and spikes a little.

大比例(但不相关)的结果只有 20 行。只有@wildplasser 的 CTE 有更多的开销和尖峰。

With more than a handful of rows, crosstab()quickly takes lead. @Andreiy's query performs about the same as my simplified version, aggregate function in outer SELECT(min(), max(), sum()) makes no measurable difference (just two rows per group).

多行,crosstab()迅速领先。@Andreiy 的查询执行与我的简化版本大致相同,外部SELECT( min(), max(), sum()) 中的聚合函数没有可测量的差异(每组只有两行)。

Everything as expected, no surprises, take my setup and try it @home.

一切都按预期进行,没有意外,请接受我的设置并尝试@home。

回答by Taryn

Is this what you are looking for:

这是你想要的:

SELECT t1.year,
    AVG(CASE WHEN t1.animal = 'kittens' THEN t1.price END) AS "kittens",
    AVG(CASE WHEN t1.animal = 'puppies' THEN t1.price END) AS "puppies"
FROM tab_test t1
inner join 
(
  select animal, count(*) YearCount, year
  from tab_test
  group by animal, year
) t2
  on t1.animal = t2.animal 
  and t1.year = t2.year
where t2.YearCount >= 3
group by t1.year

See SQL Fiddle with Demo

参见SQL Fiddle with Demo

回答by wildplasser

CREATE TABLE pussyriot(year INTEGER NOT NULL
        , animal varchar
        , price integer
        );

INSERT INTO pussyriot(year , animal , price ) VALUES
 (2000, 'kittens', 79)
, (2000, 'kittens', 93)
...
, (2007, 'puppies', 81)
, (2007, 'puppies', 38)
        ;

-- a self join is a poor man's pivot:
WITH cal AS ( -- generate calendar file
        SELECT generate_series(MIN(pr.year) , MAX(pr.year)) AS year
        FROM pussyriot pr
        )
, fur AS (
        SELECT distinct year, animal, AVG(price) AS price
        FROM pussyriot
        GROUP BY year, animal
        -- UPDATE: added next line
        HAVING COUNT(*) >= 3
        )
SELECT cal.year
        , pussy.price AS price_of_the_pussy
        , puppy.price AS price_of_the_puppy
FROM cal
LEFT JOIN fur pussy ON pussy.year=cal.year AND pussy.animal='kittens'
LEFT JOIN fur puppy ON puppy.year=cal.year AND puppy.animal='puppies'
        ;