postgresql SQL 频率分布查询以按分组计数范围并包括 0 个计数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9677321/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
SQL frequency distribution query to count ranges with group-by and include 0 counts
提问by Rob Bednark
Given:
鉴于:
table 'thing':
age
---
3.4
3.4
10.1
40
45
49
I want to count the number of things for each 10-year range, e.g.,
我想计算每 10 年范围内的事物数量,例如,
age_range | count
----------+-------
0 | 2
10| 1
20| 0
30| 0
40| 3
This query comes close:
这个查询很接近:
SELECT FLOOR(age / 10) as age_range, COUNT(*)
FROM thing
GROUP BY FLOOR(age / 10) ORDER BY FLOOR(age / 10);
Output:
输出:
age_range | count
-----------+-------
0 | 1
1 | 2
4 | 3
However, it doesn't show me the ranges which have 0 counts. How can I modify the query so that it also shows the ranges in between with 0 counts?
但是,它没有向我显示计数为 0 的范围。如何修改查询,以便它还显示介于 0 计数之间的范围?
I found similar stackoverflow questions for counting ranges, some for 0 counts, but they involve having to specify each range (either hard-coding the ranges into the query, or putting the ranges in a table). I would prefer to use a generic query like that above where I do not have to explicitly specify each range (e.g., 0-10, 10-20, 20-30, ...). I'm using PostgreSQL 9.1.3.
我发现了类似的计算范围的 stackoverflow 问题,有些是 0 计数,但它们涉及必须指定每个范围(将范围硬编码到查询中,或将范围放在表中)。我更喜欢使用像上面那样的通用查询,我不必明确指定每个范围(例如,0-10、10-20、20-30,...)。我正在使用 PostgreSQL 9.1.3。
Is there a way to modify the simple query above to include 0 counts?
有没有办法修改上面的简单查询以包含 0 个计数?
Similar:
Oracle: how to "group by" over a range?
Get frequency distribution of a decimal range in MySQL
回答by mu is too short
generate_series
to the rescue:
select 10 * s.d, count(t.age)
from generate_series(0, 10) s(d)
left outer join thing t on s.d = floor(t.age / 10)
group by s.d
order by s.d
Figuring out the upper bound for generate_series
should be trivial with a separate query, I just used 10 as a placeholder.
generate_series
通过单独的查询找出 for 的上限应该很简单,我只是使用了 10 作为占位符。
This:
这:
generate_series(0, 10) s(d)
essentially generates an inline table called s
with a single column d
which contains the values from 0 to 10 (inclusive).
本质上生成一个内联表s
,d
其中包含从 0 到 10(含)的值的单个列。
You could wrap the two queries (one to figure out the range, one to compute the counts) into a function if necessary.
如有必要,您可以将两个查询(一个用于计算范围,一个用于计算计数)包装到一个函数中。
回答by Glenn
You need some way to invent the table of age ranges. Row number usually works nicely. Do a cartesian product against a big table to get lots of numbers.
您需要某种方法来发明年龄范围表。行号通常效果很好。对一张大桌子做一个笛卡尔积来得到很多数字。
WITH RANGES AS (
SELECT (rownum - 1) * 10 AS age_range
FROM ( SELECT row_number() OVER() as rownum
FROM pg_tables
) n
,( SELECT ceil( max(age) / 10 ) range_end
FROM thing
) m
WHERE n. rownum <= range_end
)
SELECT r.age_range, COUNT(t.age) AS count
FROM ranges r
LEFT JOIN thing t ON r.age_range = FLOOR(t.age / 10) * 10
GROUP BY r.age_range
ORDER BY r.age_range;
EDIT: mu is too short has a much more elegant answer, but if you didn't have a generate_series function on the db, ... :)
编辑: mu is too short 有一个更优雅的答案,但如果你在 db 上没有 generate_series 函数,... :)