Oracle:如何在 SQL 查询中实现“自然”排序?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/219982/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Oracle: How can I implement a "natural" order-by in a SQL query?
提问by Mark Harrison
e.g,
例如,
foo1
foo2
foo10
foo100
rather than
而不是
foo1
foo10
foo100
foo2
Update: not interested in coding the sort myself (although that's interesting in its own right), but having the database to do the sort for me.
更新:对自己编码排序不感兴趣(尽管这本身很有趣),但是让数据库为我进行排序。
回答by Mark Harrison
You can use functions in your order-by clause. In this case, you can split the non-numeric and numeric portions of the field and use them as two of the ordering criteria.
您可以在 order-by 子句中使用函数。在这种情况下,您可以拆分字段的非数字部分和数字部分,并将它们用作两个排序标准。
select * from t
order by to_number(regexp_substr(a,'^[0-9]+')),
to_number(regexp_substr(a,'[0-9]+$')),
a;
You can also create a function-based index to support this:
您还可以创建基于函数的索引来支持这一点:
create index t_ix1
on t (to_number(regexp_substr(a, '^[0-9]+')),
to_number(regexp_substr(a, '[0-9]+$')),
a);
回答by Coder
I use the following function to 0-pad all sequences of digits shorter than 10 that could be found in the value, so that the total length of each to become 10 digits. It is compatible even with mixed sets of values that have one, many or none sequences of digits in them.
我使用以下函数对值中可以找到的所有小于 10 的数字序列进行 0 填充,使每个序列的总长度变为 10 位。它甚至与包含一个、多个或没有数字序列的混合值集兼容。
CREATE OR replace function NATURAL_ORDER(
P_STR varchar2
) return varchar2
IS
/** --------------------------------------------------------------------
Replaces all sequences of numbers shorter than 10 digits by 0-padded
numbers that exactly 10 digits in length. Usefull for ordering-by
using NATURAL ORDER algorithm.
*/
l_result varchar2( 32700 );
l_len integer;
l_ix integer;
l_end integer;
begin
l_result := P_STR;
l_len := LENGTH( l_result );
l_ix := 1;
while l_len > 0 loop
l_ix := REGEXP_INSTR( l_result, '[0-9]{1,9}', l_ix, 1, 0 );
EXIT when l_ix = 0;
l_end := REGEXP_INSTR( l_result, '[^0-9]|$', l_ix, 1, 0 );
if ( l_end - l_ix >= 10 ) then
l_ix := l_end;
else
l_result := substr( l_result, 1, l_ix - 1 )
|| LPAD( SUBSTR( l_result, l_ix, l_end-l_ix ), 10, '0' )
|| substr( l_result, l_end )
;
l_ix := l_ix + 10;
end if;
end loop;
return l_result;
end;
/
For example:
例如:
select 'ABC' || LVL || 'DEF' as STR
from (
select LEVEL as LVL
from DUAL
start with 1=1
connect by LEVEL <= 35
)
order by NATURAL_ORDER( STR )
回答by Vladimir Sitnikov
For short strings, small number of numerics
对于短字符串,少量数字
If number of "numerics" and the maximum length are limited, there is a regexp-based solution.
如果“数字”的数量和最大长度受到限制,则有一个基于正则表达式的解决方案。
The idea is:
这个想法是:
- Pad all numerics with 20 zeroes
- Remove excessive zeroes using another regexp. This might be slow due to regexp backtracking.
- 用 20 个零填充所有数字
- 使用另一个正则表达式删除过多的零。由于正则表达式回溯,这可能会很慢。
Assumptions:
假设:
- Maximum length of numerics is known beforehand (e.g. 20)
- All the numerics can be padded (in other words,
lpad('1 ', 3000, '1 ')
will fail due do unable to fit padded numerics intovarchar2(4000)
)
- 预先知道数字的最大长度(例如 20)
- 可以填充所有数字(换句话说,
lpad('1 ', 3000, '1 ')
由于无法将填充的数字放入 中,将失败varchar2(4000)
)
The following query is optimized for "short numerics" case (see *?
) and it takes 0.4 seconds. However, when using such approach, you need to predefine padding length.
以下查询针对“短数字”情况进行了优化(请参阅 参考资料*?
),它需要 0.4 秒。但是,在使用这种方法时,您需要预先定义填充长度。
select * from (
select dbms_random.string('X', 30) val from xmltable('1 to 1000')
)
order by regexp_replace(regexp_replace(val, '(\d+)', lpad('0', 20, '0')||'')
, '0*?(\d{21}(\D|$))', '');
"Clever" approach
“聪明”的做法
Even though separate natural_sort
function can be handy, there is a little-known trick to do that in pure SQL.
尽管单独的natural_sort
函数可能很方便,但在纯 SQL 中还有一个鲜为人知的技巧。
Key ideas:
关键思想:
- Strip leading zeroes from all the numerics so
02
is ordered between1
and3
:regexp_replace(val, '(^|\D)0+(\d+)', '\1\2')
. Note: this might result in "unexpected" sorting of10.02
>10.1
(since02
is converted to2
), however there is no single answer how things like10.02.03
should be sorted - Convert
"
to""
so text with quotes works properly - Convert input string to comma delimited format:
'"'||regexp_replace(..., '([^0-9]+)', '","\1","')||'"'
- Convert csv to the list of items via
xmltable
- Augment numeric-like items so string sort works properly
- Use
length(length(num))||length(num)||num
instead oflpad(num, 10, '0')
as the latter is less compact and does not support 11+ digit numbers. Note:
- 从所有数字中去除前导零,因此
02
在1
和之间排序3
:regexp_replace(val, '(^|\D)0+(\d+)', '\1\2')
。注:这可能会导致“意外”的排序10.02
>10.1
(因为02
被转化成2
),但没有一个统一的答案如何之类的东西10.02.03
应该进行排序 - 转换
"
为""
带引号的文本可以正常工作 - 将输入字符串转换为逗号分隔格式:
'"'||regexp_replace(..., '([^0-9]+)', '","\1","')||'"'
- 通过以下方式将 csv 转换为项目列表
xmltable
- 增加类似数字的项目,以便字符串排序正常工作
- 使用
length(length(num))||length(num)||num
代替,lpad(num, 10, '0')
因为后者不太紧凑并且不支持 11+ 位数字。笔记:
Response time is something like 3-4 seconds for sorting list of 1000 random strings of length 30 (the generation of the random strings takes 0.2 sec itself).
The main time consumer is xmltable
that splits text into rows.
If using PL/SQL instead of xmltable
to split string into rows the response time reduces to 0.4sec for the same 1000 rows.
对于长度为 30 的 1000 个随机字符串的排序列表,响应时间大约为 3-4 秒(随机字符串的生成本身需要 0.2 秒)。主要的时间消费者是xmltable
将文本拆分成行。如果使用 PL/SQL 而不是xmltable
将字符串拆分为行,对于相同的 1000 行,响应时间减少到 0.4 秒。
The following query performs natural sort of 100 random alpha-numeric strings (note: it produces wrong results in Oracle 11.2.0.4 and it works in 12.1.0.2):
以下查询对 100 个随机字母数字字符串执行自然排序(注意:它在 Oracle 11.2.0.4 中产生错误结果,在 12.1.0.2 中有效):
select *
from (
select (select listagg(case when regexp_like(w, '^[0-9]')
then length(length(w))||length(w)||w else w
end
) within group (order by ord)
from xmltable(t.csv columns w varchar2(4000) path '.'
, ord for ordinality) q
) order_by
, t.*
from (
select '"'||regexp_replace(replace(
regexp_replace(val, '(^|\D)0+(\d+)', '')
, '"', '""')
, '([^0-9]+)', '","","')||'"' csv
, t.*
from (
select dbms_random.string('X', 30) val from xmltable('1 to 100')
) t
) t
) t
order by order_by;
The fun part is this order by
can be expressed without subqueries, so it is a handy tool to make your reviewer crazy (it works in both 11.2.0.4 and 12.1.0.2):
有趣的是这order by
可以在没有子查询的情况下表达,所以它是一个让你的评论者疯狂的方便工具(它在 11.2.0.4 和 12.1.0.2 中都有效):
select *
from (select dbms_random.string('X', 30) val from xmltable('1 to 100')) t
order by (
select listagg(case when regexp_like(w, '^[0-9]')
then length(length(w))||length(w)||w else w
end
) within group (order by ord)
from xmltable('$X'
passing xmlquery(('"'||regexp_replace(replace(
regexp_replace(t.val, '(^|\D)0+(\d+)', '')
, '"', '""')
, '([^0-9]+)', '","","')||'"')
returning sequence
) as X
columns w varchar2(4000) path '.', ord for ordinality) q
);