SQL Oracle:如何在一个范围内“分组”?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2483140/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Oracle: how to "group by" over a range?
提问by Mark Harrison
If I have a table like this:
如果我有一张这样的表:
pkey age
---- ---
1 8
2 5
3 12
4 12
5 22
I can "group by" to get a count of each age.
我可以“分组”来计算每个年龄。
select age,count(*) n from tbl group by age;
age n
--- -
5 1
8 1
12 2
22 1
What query can I use to group by age ranges?
我可以使用什么查询来按年龄范围分组?
age n
----- -
1-10 2
11-20 2
20+ 1
I'm on 10gR2, but I'd be interested in any 11g-specific approaches as well.
我在使用 10gR2,但我也对任何特定于 11g 的方法感兴趣。
回答by Einstein
SELECT CASE
WHEN age <= 10 THEN '1-10'
WHEN age <= 20 THEN '11-20'
ELSE '21+'
END AS age,
COUNT(*) AS n
FROM age
GROUP BY CASE
WHEN age <= 10 THEN '1-10'
WHEN age <= 20 THEN '11-20'
ELSE '21+'
END
回答by Matthew Flaschen
Try:
尝试:
select to_char(floor(age/10) * 10) || '-'
|| to_char(ceil(age/10) * 10 - 1)) as age,
count(*) as n from tbl group by floor(age/10);
回答by Wintermute
What you are looking for, is basically the data for a histogram.
您正在寻找的基本上是直方图的数据。
You would have the age (or age-range) on the x-axis and the count n (or frequency) on the y-axis.
您将在 x 轴上有年龄(或年龄范围),在 y 轴上有计数 n(或频率)。
In the simplest form, one could simply count the number of each distinct age value like you already described:
以最简单的形式,您可以像您已经描述的那样简单地计算每个不同年龄值的数量:
SELECT age, count(*)
FROM tbl
GROUP BY age
When there are too many different values for the x-axis however, one may want to create groups (or clusters or buckets). In your case, you group by a constant range of 10.
然而,当 x 轴有太多不同的值时,人们可能想要创建组(或集群或桶)。在您的情况下,您按 10 的恒定范围进行分组。
We can avoid writing a WHEN ... THEN
line for each range - there could be hundreds if it were not about age. Instead, the approach by @MatthewFlaschen is preferable for the reasons mentioned by @NitinMidha.
我们可以避免WHEN ... THEN
为每个范围写一行——如果不是关于年龄,可能会有数百条。相反,由于@NitinMidha 提到的原因,@MatthewFlaschen 的方法更可取。
Now let's build the SQL...
现在让我们构建 SQL...
First, we need to split the ages into range-groups of 10 like so:
首先,我们需要将年龄分成 10 个范围组,如下所示:
- 0-9
- 10-19
- 20 - 29
- etc.
- 0-9
- 10-19
- 20 - 29
- 等等。
This can be achieved by dividing the age column by 10 and then calculating the result's FLOOR:
这可以通过将年龄列除以 10 然后计算结果的 FLOOR 来实现:
FLOOR(age/10)
"FLOOR returns the largest integer equal to or less than n" http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions067.htm#SQLRF00643
“FLOOR 返回等于或小于 n 的最大整数” http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions067.htm#SQLRF00643
Then we take the original SQL and replace agewith that expression:
然后我们采用原始 SQL 并用该表达式替换age:
SELECT FLOOR(age/10), count(*)
FROM tbl
GROUP BY FLOOR(age/10)
This is OK, but we cannot see the range, yet. Instead we only see the calculated floor values which are 0, 1, 2 ... n
.
这没问题,但我们还看不到范围。相反,我们只看到计算出的楼层值为0, 1, 2 ... n
。
To get the actual lower bound, we need to multiply it with 10 again so we get 0, 10, 20 ... n
:
为了得到实际的下限,我们需要再次乘以 10,所以我们得到0, 10, 20 ... n
:
FLOOR(age/10) * 10
We also need the upper bound of each range which is lower bound + 10 - 1or
我们还需要每个范围的上限,即下限+ 10 - 1或
FLOOR(age/10) * 10 + 10 - 1
Finally, we concatenate both into a string like this:
最后,我们将两者连接成这样的字符串:
TO_CHAR(FLOOR(age/10) * 10) || '-' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1)
This creates '0-9', '10-19', '20-29'
etc.
这会创建'0-9', '10-19', '20-29'
等。
Now our SQL looks like this:
现在我们的 SQL 看起来像这样:
SELECT
TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1),
COUNT(*)
FROM tbl
GROUP BY FLOOR(age/10)
Finally, apply an order and nice column aliases:
最后,应用订单和漂亮的列别名:
SELECT
TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1) AS range,
COUNT(*) AS frequency
FROM tbl
GROUP BY FLOOR(age/10)
ORDER BY FLOOR(age/10)
However, in more complex scenarios, these ranges might not be grouped into constant chunks of size 10, but need dynamical clustering. Oracle has more advanced histogram functions included, see http://docs.oracle.com/cd/E16655_01/server.121/e15858/tgsql_histo.htm#TGSQL366
然而,在更复杂的场景中,这些范围可能不会被分组到大小为 10 的恒定块中,而是需要动态聚类。Oracle 包含更高级的直方图函数,参见http://docs.oracle.com/cd/E16655_01/server.121/e15858/tgsql_histo.htm#TGSQL366
Credits to @MatthewFlaschen for his approach; I only explained the details.
感谢@MatthewFlaschen 的方法;我只解释了细节。
回答by Dan
Here is a solution which creates a "range" table in a sub-query and then uses this to partition the data from the main table:
这是一个解决方案,它在子查询中创建一个“范围”表,然后使用它从主表中分区数据:
SELECT DISTINCT descr
, COUNT(*) OVER (PARTITION BY descr) n
FROM age_table INNER JOIN (
select '1-10' descr, 1 rng_start, 10 rng_stop from dual
union (
select '11-20', 11, 20 from dual
) union (
select '20+', 21, null from dual
)) ON age BETWEEN nvl(rng_start, age) AND nvl(rng_stop, age)
ORDER BY descr;
回答by Clarkey
I had to group data by how many transactions appeared in an hour. I did this by extracting the hour from the timestamp:
我必须按照一小时内出现的交易数量对数据进行分组。我通过从时间戳中提取小时来做到这一点:
select extract(hour from transaction_time) as hour
,count(*)
from table
where transaction_date='01-jan-2000'
group by
extract(hour from transaction_time)
order by
extract(hour from transaction_time) asc
;
Giving output:
给出输出:
HOUR COUNT(*)
---- --------
1 9199
2 9167
3 9997
4 7218
As you can see this gives a nice easy way of grouping the number of records per hour.
正如您所看到的,这提供了一种对每小时记录数进行分组的简单方法。
回答by Kieron Hardy
I had to get a count of samples by day. Inspired by @Clarkey I used TO_CHAR to extract the date of sample from the timestamp to an ISO-8601 date format and used that in the GROUP BY and ORDER BY clauses. (Further inspired, I also post it here in case it is useful to others.)
我必须每天计算样本数量。受@Clarkey 的启发,我使用 TO_CHAR 从时间戳中提取样本日期为 ISO-8601 日期格式,并在 GROUP BY 和 ORDER BY 子句中使用它。(进一步启发,我也把它贴在这里,以防对其他人有用。)
SELECT
TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD') AS TS_DAY,
COUNT(*)
FROM
TABLE X
GROUP BY
TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD')
ORDER BY
TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD') ASC
/
回答by Jon Black
add an age_range table and an age_range_id field to your table and group by that instead.
将 age_range 表和 age_range_id 字段添加到您的表中,并按其分组。
// excuse the DDL but you should get the idea
// 原谅 DDL 但你应该明白
create table age_range(
age_range_id tinyint unsigned not null primary key,
name varchar(255) not null);
insert into age_range values
(1, '18-24'),(2, '25-34'),(3, '35-44'),(4, '45-54'),(5, '55-64');
// again excuse the DML but you should get the idea
// 再次原谅 DML 但你应该明白
select
count(*) as counter, p.age_range_id, ar.name
from
person p
inner join age_range ar on p.age_range_id = ar.age_range_id
group by
p.age_range_id, ar.name order by counter desc;
You can refine this idea if you like - add from_age to_age columns in the age_range table etc - but i'll leave that to you.
如果你愿意,你可以改进这个想法 - 在 age_range 表中添加 from_age to_age 列等 - 但我会把它留给你。
hope this helps :)
希望这可以帮助 :)
回答by OMG Ponies
If using Oracle 9i+, you mightbe able to use the NTILE
analytic function:
如果使用 Oracle 9i+,您或许可以使用NTILE
分析函数:
WITH tiles AS (
SELECT t.age,
NTILE(3) OVER (ORDER BY t.age) AS tile
FROM TABLE t)
SELECT MIN(t.age) AS min_age,
MAX(t.age) AS max_age,
COUNT(t.tile) As n
FROM tiles t
GROUP BY t.tile
The caveat to NTILE is that you can only specify the number of partitions, not the break points themselves. So you need to specify a number that is appropriate. IE: With 100 rows, NTILE(4)
will allot 25 rows to each of the four buckets/partitions. You can not nest analytic functions, so you'd have to layer them using subqueries/subquery factoring to get desired granularity. Otherwise, use:
对 NTILE 的警告是您只能指定分区数,而不是断点本身。所以你需要指定一个合适的数字。IE:有 100 行,NTILE(4)
将为四个桶/分区中的每一个分配 25 行。您不能嵌套分析函数,因此您必须使用子查询/子查询分解对它们进行分层以获得所需的粒度。否则,请使用:
SELECT CASE t.age
WHEN BETWEEN 1 AND 10 THEN '1-10'
WHEN BETWEEN 11 AND 20 THEN '11-20'
ELSE '21+'
END AS age,
COUNT(*) AS n
FROM TABLE t
GROUP BY CASE t.age
WHEN BETWEEN 1 AND 10 THEN '1-10'
WHEN BETWEEN 11 AND 20 THEN '11-20'
ELSE '21+'
END
回答by sudarshan vp
Can you try the below solution:
您可以尝试以下解决方案吗:
SELECT count (1), '1-10' where age between 1 and 10
union all
SELECT count (1), '11-20' where age between 11 and 20
union all
select count (1), '21+' where age >20
from age
回答by Ananth N
My approach:
我的做法:
select range, count(1) from (
select case
when age < 5 then '0-4'
when age < 10 then '5-9'
when age < 15 then '10-14'
when age < 20 then '15-20'
when age < 30 then '21-30'
when age < 40 then '31-40'
when age < 50 then '41-50'
else '51+'
end
as range from
(select round(extract(day from feedback_update_time - feedback_time), 1) as age
from txn_history
) ) group by range
- I have flexibility in defining the ranges
- I do not repeat the ranges in select and group clauses
- but some one please tell me, how to order them by magnitude!
- 我可以灵活地定义范围
- 我不会重复 select 和 group 子句中的范围
- 但有人请告诉我,如何按大小排序!