SQL Oracle：如何在一个范围内“分组”？

Question

提问by Mark Harrison

If I have a table like this:

如果我有一张这样的表：

I can "group by" to get a count of each age.

我可以“分组”来计算每个年龄。

select age,count(*) n from tbl group by age;
age  n
---  -
  5  1
  8  1
 12  2
 22  1

What query can I use to group by age ranges?

我可以使用什么查询来按年龄范围分组？

  age  n
-----  -
 1-10  2
11-20  2
20+    1

I'm on 10gR2, but I'd be interested in any 11g-specific approaches as well.

我在使用 10gR2，但我也对任何特定于 11g 的方法感兴趣。

Answer 1

回答by Einstein

SELECT CASE 
         WHEN age <= 10 THEN '1-10' 
         WHEN age <= 20 THEN '11-20' 
         ELSE '21+' 
       END AS age, 
       COUNT(*) AS n
FROM age
GROUP BY CASE 
           WHEN age <= 10 THEN '1-10' 
           WHEN age <= 20 THEN '11-20' 
           ELSE '21+' 
         END

Answer 2

回答by Matthew Flaschen

Try:

尝试：

select to_char(floor(age/10) * 10) || '-' 
|| to_char(ceil(age/10) * 10 - 1)) as age, 
count(*) as n from tbl group by floor(age/10);

Answer 3

回答by Wintermute

What you are looking for, is basically the data for a histogram.

您正在寻找的基本上是直方图的数据。

You would have the age (or age-range) on the x-axis and the count n (or frequency) on the y-axis.

您将在 x 轴上有年龄（或年龄范围），在 y 轴上有计数 n（或频率）。

In the simplest form, one could simply count the number of each distinct age value like you already described:

以最简单的形式，您可以像您已经描述的那样简单地计算每个不同年龄值的数量：

SELECT age, count(*)
FROM tbl
GROUP BY age

When there are too many different values for the x-axis however, one may want to create groups (or clusters or buckets). In your case, you group by a constant range of 10.

然而，当 x 轴有太多不同的值时，人们可能想要创建组（或集群或桶）。在您的情况下，您按 10 的恒定范围进行分组。

We can avoid writing a WHEN ... THENline for each range - there could be hundreds if it were not about age. Instead, the approach by @MatthewFlaschen is preferable for the reasons mentioned by @NitinMidha.

我们可以避免WHEN ... THEN为每个范围写一行——如果不是关于年龄，可能会有数百条。相反，由于@NitinMidha 提到的原因，@MatthewFlaschen 的方法更可取。

Now let's build the SQL...

现在让我们构建 SQL...

First, we need to split the ages into range-groups of 10 like so:

首先，我们需要将年龄分成 10 个范围组，如下所示：

0-9
10-19
20 - 29
etc.

0-9
10-19
20 - 29
等等。

This can be achieved by dividing the age column by 10 and then calculating the result's FLOOR:

这可以通过将年龄列除以 10 然后计算结果的 FLOOR 来实现：

FLOOR(age/10)

"FLOOR returns the largest integer equal to or less than n" http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions067.htm#SQLRF00643

“FLOOR 返回等于或小于 n 的最大整数” http://docs.oracle.com/cd/E11882_01/server.112/e26088/functions067.htm#SQLRF00643

Then we take the original SQL and replace agewith that expression:

然后我们采用原始 SQL 并用该表达式替换age：

SELECT FLOOR(age/10), count(*)
FROM tbl
GROUP BY FLOOR(age/10)

This is OK, but we cannot see the range, yet. Instead we only see the calculated floor values which are 0, 1, 2 ... n.

这没问题，但我们还看不到范围。相反，我们只看到计算出的楼层值为0, 1, 2 ... n。

To get the actual lower bound, we need to multiply it with 10 again so we get 0, 10, 20 ... n:

为了得到实际的下限，我们需要再次乘以 10，所以我们得到0, 10, 20 ... n：

FLOOR(age/10) * 10

We also need the upper bound of each range which is lower bound + 10 - 1or

我们还需要每个范围的上限，即下限+ 10 - 1或

FLOOR(age/10) * 10 + 10 - 1

Finally, we concatenate both into a string like this:

最后，我们将两者连接成这样的字符串：

TO_CHAR(FLOOR(age/10) * 10) || '-' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1)

This creates '0-9', '10-19', '20-29'etc.

这会创建'0-9', '10-19', '20-29'等。

Now our SQL looks like this:

现在我们的 SQL 看起来像这样：

SELECT 
TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1),
COUNT(*)
FROM tbl
GROUP BY FLOOR(age/10)

Finally, apply an order and nice column aliases:

最后，应用订单和漂亮的列别名：

SELECT 
TO_CHAR(FLOOR(age/10) * 10) || ' - ' || TO_CHAR(FLOOR(age/10) * 10 + 10 - 1) AS range,
COUNT(*) AS frequency
FROM tbl
GROUP BY FLOOR(age/10)
ORDER BY FLOOR(age/10)

However, in more complex scenarios, these ranges might not be grouped into constant chunks of size 10, but need dynamical clustering. Oracle has more advanced histogram functions included, see http://docs.oracle.com/cd/E16655_01/server.121/e15858/tgsql_histo.htm#TGSQL366

然而，在更复杂的场景中，这些范围可能不会被分组到大小为 10 的恒定块中，而是需要动态聚类。Oracle 包含更高级的直方图函数，参见http://docs.oracle.com/cd/E16655_01/server.121/e15858/tgsql_histo.htm#TGSQL366

Credits to @MatthewFlaschen for his approach; I only explained the details.

感谢@MatthewFlaschen 的方法；我只解释了细节。

Answer 4

回答by Dan

Here is a solution which creates a "range" table in a sub-query and then uses this to partition the data from the main table:

这是一个解决方案，它在子查询中创建一个“范围”表，然后使用它从主表中分区数据：

SELECT DISTINCT descr
  , COUNT(*) OVER (PARTITION BY descr) n
FROM age_table INNER JOIN (
  select '1-10' descr, 1 rng_start, 10 rng_stop from dual
  union (
  select '11-20', 11, 20 from dual
  ) union (
  select '20+', 21, null from dual
)) ON age BETWEEN nvl(rng_start, age) AND nvl(rng_stop, age)
ORDER BY descr;

Answer 5

回答by Clarkey

I had to group data by how many transactions appeared in an hour. I did this by extracting the hour from the timestamp:

我必须按照一小时内出现的交易数量对数据进行分组。我通过从时间戳中提取小时来做到这一点：

select extract(hour from transaction_time) as hour
      ,count(*)
from   table
where  transaction_date='01-jan-2000'
group by
       extract(hour from transaction_time)
order by
       extract(hour from transaction_time) asc
;

Giving output:

给出输出：

HOUR COUNT(*)
---- --------
   1     9199 
   2     9167 
   3     9997 
   4     7218

As you can see this gives a nice easy way of grouping the number of records per hour.

正如您所看到的，这提供了一种对每小时记录数进行分组的简单方法。

Answer 6

回答by Kieron Hardy

I had to get a count of samples by day. Inspired by @Clarkey I used TO_CHAR to extract the date of sample from the timestamp to an ISO-8601 date format and used that in the GROUP BY and ORDER BY clauses. (Further inspired, I also post it here in case it is useful to others.)

我必须每天计算样本数量。受@Clarkey 的启发，我使用 TO_CHAR 从时间戳中提取样本日期为 ISO-8601 日期格式，并在 GROUP BY 和 ORDER BY 子句中使用它。（进一步启发，我也把它贴在这里，以防对其他人有用。）

SELECT 
  TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD') AS TS_DAY, 
  COUNT(*) 
FROM   
  TABLE X
GROUP BY
  TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD')
ORDER BY
  TO_CHAR(X.TS_TIMESTAMP, 'YYYY-MM-DD') ASC
/

Answer 7

回答by Jon Black

add an age_range table and an age_range_id field to your table and group by that instead.

将 age_range 表和 age_range_id 字段添加到您的表中，并按其分组。

// excuse the DDL but you should get the idea

// 原谅 DDL 但你应该明白

create table age_range(
age_range_id tinyint unsigned not null primary key,
name varchar(255) not null);

insert into age_range values 
(1, '18-24'),(2, '25-34'),(3, '35-44'),(4, '45-54'),(5, '55-64');

// again excuse the DML but you should get the idea

// 再次原谅 DML 但你应该明白

select
 count(*) as counter, p.age_range_id, ar.name
from
  person p
inner join age_range ar on p.age_range_id = ar.age_range_id
group by
  p.age_range_id, ar.name order by counter desc;

You can refine this idea if you like - add from_age to_age columns in the age_range table etc - but i'll leave that to you.

如果你愿意，你可以改进这个想法 - 在 age_range 表中添加 from_age to_age 列等 - 但我会把它留给你。

hope this helps :)

希望这可以帮助：）

Answer 8

回答by OMG Ponies

If using Oracle 9i+, you mightbe able to use the NTILEanalytic function:

如果使用 Oracle 9i+，您或许可以使用NTILE分析函数：

WITH tiles AS (
  SELECT t.age,
         NTILE(3) OVER (ORDER BY t.age) AS tile
    FROM TABLE t)
  SELECT MIN(t.age) AS min_age,
         MAX(t.age) AS max_age,
         COUNT(t.tile) As n
    FROM tiles t
GROUP BY t.tile

The caveat to NTILE is that you can only specify the number of partitions, not the break points themselves. So you need to specify a number that is appropriate. IE: With 100 rows, NTILE(4)will allot 25 rows to each of the four buckets/partitions. You can not nest analytic functions, so you'd have to layer them using subqueries/subquery factoring to get desired granularity. Otherwise, use:

对 NTILE 的警告是您只能指定分区数，而不是断点本身。所以你需要指定一个合适的数字。IE：有 100 行，NTILE(4)将为四个桶/分区中的每一个分配 25 行。您不能嵌套分析函数，因此您必须使用子查询/子查询分解对它们进行分层以获得所需的粒度。否则，请使用：

  SELECT CASE t.age
           WHEN BETWEEN 1 AND 10 THEN '1-10' 
           WHEN BETWEEN 11 AND 20 THEN '11-20' 
           ELSE '21+' 
         END AS age, 
         COUNT(*) AS n
    FROM TABLE t
GROUP BY CASE t.age
           WHEN BETWEEN 1 AND 10 THEN '1-10' 
           WHEN BETWEEN 11 AND 20 THEN '11-20' 
           ELSE '21+' 
         END

Answer 9

回答by sudarshan vp

Can you try the below solution:

您可以尝试以下解决方案吗：

SELECT count (1), '1-10'  where age between 1 and 10
union all 
SELECT count (1), '11-20'  where age between 11 and 20
union all
select count (1), '21+' where age >20
from age

Answer 10

回答by Ananth N

My approach:

我的做法：

select range, count(1) from (
select case 
  when age < 5 then '0-4' 
  when age < 10 then '5-9' 
  when age < 15 then '10-14' 
  when age < 20 then '15-20' 
  when age < 30 then '21-30' 
  when age < 40 then '31-40' 
  when age < 50 then '41-50' 
  else                '51+' 
end 
as range from
(select round(extract(day from feedback_update_time - feedback_time), 1) as age
from txn_history
) ) group by range

I have flexibility in defining the ranges
I do not repeat the ranges in select and group clauses
but some one please tell me, how to order them by magnitude!

我可以灵活地定义范围
我不会重复 select 和 group 子句中的范围
但有人请告诉我，如何按大小排序！

SQL Oracle：如何在一个范围内“分组”？

提问by Mark Harrison

回答by Einstein

回答by Matthew Flaschen

回答by Wintermute

回答by Dan

回答by Clarkey

回答by Kieron Hardy

回答by Jon Black

回答by OMG Ponies

回答by sudarshan vp

回答by Ananth N

相关推荐

最近更新

标签

SQL Oracle：如何在一个范围内“分组”？

提问by Mark Harrison

回答by Einstein

回答by Matthew Flaschen

回答by Wintermute

回答by Dan

回答by Clarkey

回答by Kieron Hardy

回答by Jon Black

回答by OMG Ponies

回答by sudarshan vp

回答by Ananth N

相关推荐

SQL AdventureWorks db 有教程/练习吗？

我可以为 SQL 中的特定列设置公式吗？

SQL Server 查询以获取表中的列列表以及数据类型、NOT NULL 和 PRIMARY KEY 约束

SQL Server 2005 将 VARCHAR 转换为 INT 但默认为无效类型

相关推荐

最近更新

标签