postgresql 如何使用 SQL 窗口函数计算聚合的百分比
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/8515152/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to use a SQL window function to calculate a percentage of an aggregate
提问by EvilPuppetMaster
I need to calculate percentages of various dimensions in a table. I'd like to simplify things by using window functions to calculate the denominator, however I am having an issue because the numerator has to be an aggregate as well.
我需要计算表格中各个维度的百分比。我想通过使用窗口函数来计算分母来简化事情,但是我遇到了一个问题,因为分子也必须是一个聚合体。
As a simple example, take the following table:
举个简单的例子,看下表:
create temp table test (d1 text, d2 text, v numeric);
insert into test values ('a','x',5), ('a','y',5), ('a','y',10), ('b','x',20);
If I just want to calculate the share of each individual row out of d1, then windowing functions work fine:
如果我只想计算 d1 中每一行的份额,则窗口函数可以正常工作:
select d1, d2, v/sum(v) over (partition by d1)
from test;
"b";"x";1.00
"a";"x";0.25
"a";"y";0.25
"a";"y";0.50
However, what I need to do is calculate the overall share for the sum of d2 out of d1. The output I am looking for is this:
但是,我需要做的是计算 d1 中 d2 之和的总份额。我正在寻找的输出是这样的:
"b";"x";1.00
"a";"x";0.25
"a";"y";0.75
So I try this:
所以我试试这个:
select d1, d2, sum(v)/sum(v) over (partition by d1)
from test
group by d1, d2;
However, now I get an error:
但是,现在我收到一个错误:
ERROR: column "test.v" must appear in the GROUP BY clause or be used in an aggregate function
I'm assuming this is because it is complaining that the window function is not accounted for in the grouping clause, however windowing functions cannot be put in the grouping clause anyway.
我假设这是因为它抱怨在分组子句中没有考虑窗口函数,但是无论如何不能将窗口函数放入分组子句中。
This is using Greenplum 4.1, which is a fork of Postgresql 8.4 and shares the same windowing functions. Note that Greenplum cannot do correlated subqueries.
这是使用 Greenplum 4.1,它是 Postgresql 8.4 的一个分支并共享相同的窗口函数。请注意,Greenplum 不能进行相关子查询。
回答by Erwin Brandstetter
I think what you are actually looking for is this:
我认为你真正要找的是这个:
SELECT d1, d2, sum(v)/sum(sum(v)) OVER (PARTITION BY d1) AS share
FROM test
GROUP BY d1, d2;
Produces the requested result.
产生请求的结果。
Window functions are applied afteraggregate functions. The outer sum()
in sum(sum(v))
is a window function in this example and is attached to the OVER ...
clause, while the inner sum()
is an aggregate.
在聚合函数之后应用窗口函数。在这个例子中,outer sum()
insum(sum(v))
是一个窗口函数并附加到OVER ...
子句,而innersum()
是一个聚合。
Effectively the same as:
实际上等同于:
WITH x AS (
SELECT d1, d2, sum(v) AS sv
FROM test
GROUP BY d1, d2
)
SELECT d1, d2, sv/sum(sv) OVER (PARTITION BY d1) AS share
FROM x;
Or (without CTE):
或(无 CTE):
SELECT d1, d2, sv/sum(sv) OVER (PARTITION BY d1) AS share
FROM (
SELECT d1, d2, sum(v) AS sv
FROM test
GROUP BY d1, d2
) x;
Or @Mu's variant.
或者@Mu 的变种。
Aside: Greenplum introduced correlated subqueries with version 4.2. See release notes.
旁白:Greenplum 在 4.2 版中引入了相关子查询。请参阅发行说明。
回答by mu is too short
Do you need to do it all with window functions? Sounds like you just need to group the result you have by d1
and d2
and then sum the sums:
你需要用窗口函数来做这一切吗?像你这样的声音只需要组必须通过的结果d1
和d2
再总结的款项:
select d1, d2, sum(p)
from (
select d1, d2, v/sum(v) over (partition by d1) as p
from test
) as dt
group by d1, d2
That gives me this:
这给了我这个:
d1 | d2 | sum
----+----+------------------------
a | x | 0.25000000000000000000
a | y | 0.75000000000000000000
b | x | 1.00000000000000000000