Oracle:如何在 SQL 查询中实现“自然”排序?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/219982/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-18 17:23:36  来源:igfitidea点击:

Oracle: How can I implement a "natural" order-by in a SQL query?

sqloracle

提问by Mark Harrison

e.g,

例如,

foo1
foo2
foo10
foo100

rather than

而不是

foo1
foo10
foo100
foo2

Update: not interested in coding the sort myself (although that's interesting in its own right), but having the database to do the sort for me.

更新:对自己编码排序不感兴趣(尽管这本身很有趣),但是让数据库为我进行排序。

回答by Mark Harrison

You can use functions in your order-by clause. In this case, you can split the non-numeric and numeric portions of the field and use them as two of the ordering criteria.

您可以在 order-by 子句中使用函数。在这种情况下,您可以拆分字段的非数字部分和数字部分,并将它们用作两个排序标准。

select * from t
 order by to_number(regexp_substr(a,'^[0-9]+')),
          to_number(regexp_substr(a,'[0-9]+$')),
          a;

You can also create a function-based index to support this:

您还可以创建基于函数的索引来支持这一点:

create index t_ix1
    on t (to_number(regexp_substr(a, '^[0-9]+')),
          to_number(regexp_substr(a, '[0-9]+$')), 
          a);

回答by Coder

I use the following function to 0-pad all sequences of digits shorter than 10 that could be found in the value, so that the total length of each to become 10 digits. It is compatible even with mixed sets of values that have one, many or none sequences of digits in them.

我使用以下函数对值中可以找到的所有小于 10 的数字序列进行 0 填充,使每个序列的总长度变为 10 位。它甚至与包含一个、多个或没有数字序列的混合值集兼容。

CREATE OR replace function NATURAL_ORDER(
    P_STR   varchar2
) return varchar2
IS
/** --------------------------------------------------------------------
    Replaces all sequences of numbers shorter than 10 digits by 0-padded
    numbers that exactly 10 digits in length. Usefull for ordering-by
    using NATURAL ORDER algorithm.
 */
    l_result  varchar2( 32700 );
    l_len     integer;
    l_ix      integer;
    l_end     integer;
begin
    l_result := P_STR;
    l_len := LENGTH( l_result );
    l_ix := 1;
    while l_len > 0 loop
        l_ix := REGEXP_INSTR( l_result, '[0-9]{1,9}', l_ix, 1, 0 );
        EXIT when l_ix = 0;
        l_end := REGEXP_INSTR( l_result, '[^0-9]|$', l_ix, 1, 0 );
        if ( l_end - l_ix >= 10 ) then
            l_ix := l_end;
        else
            l_result := substr( l_result, 1, l_ix - 1 )
                     || LPAD( SUBSTR( l_result, l_ix, l_end-l_ix ), 10, '0' )
                     || substr( l_result, l_end )
                     ;
            l_ix := l_ix + 10;
        end if;
    end loop;
    return l_result;
end;
/

For example:

例如:

select 'ABC' || LVL || 'DEF' as STR
  from (
          select LEVEL as LVL
            from DUAL
           start with 1=1
           connect by LEVEL <= 35
       )
 order by NATURAL_ORDER( STR )

回答by Vladimir Sitnikov

For short strings, small number of numerics

对于短字符串,少量数字

If number of "numerics" and the maximum length are limited, there is a regexp-based solution.

如果“数字”的数量和最大长度受到限制,则有一个基于正则表达式的解决方案。

The idea is:

这个想法是:

  • Pad all numerics with 20 zeroes
  • Remove excessive zeroes using another regexp. This might be slow due to regexp backtracking.
  • 用 20 个零填充所有数字
  • 使用另一个正则表达式删除过多的零。由于正则表达式回溯,这可能会很慢。

Assumptions:

假设:

  • Maximum length of numerics is known beforehand (e.g. 20)
  • All the numerics can be padded (in other words, lpad('1 ', 3000, '1 ')will fail due do unable to fit padded numerics into varchar2(4000))
  • 预先知道数字的最大长度(例如 20)
  • 可以填充所有数字(换句话说,lpad('1 ', 3000, '1 ')由于无法将填充的数字放入 中,将失败varchar2(4000)

The following query is optimized for "short numerics" case (see *?) and it takes 0.4 seconds. However, when using such approach, you need to predefine padding length.

以下查询针对“短数字”情况进行了优化(请参阅 参考资料*?),它需要 0.4 秒。但是,在使用这种方法时,您需要预先定义填充长度。

select * from (
  select dbms_random.string('X', 30) val from xmltable('1 to 1000')
)
order by regexp_replace(regexp_replace(val, '(\d+)', lpad('0', 20, '0')||'')
                      , '0*?(\d{21}(\D|$))', '');

"Clever" approach

“聪明”的做法

Even though separate natural_sortfunction can be handy, there is a little-known trick to do that in pure SQL.

尽管单独的natural_sort函数可能很方便,但在纯 SQL 中还有一个鲜为人知的技巧。

Key ideas:

关键思想:

  • Strip leading zeroes from all the numerics so 02is ordered between 1and 3: regexp_replace(val, '(^|\D)0+(\d+)', '\1\2'). Note: this might result in "unexpected" sorting of 10.02> 10.1(since 02is converted to 2), however there is no single answer how things like 10.02.03should be sorted
  • Convert "to ""so text with quotes works properly
  • Convert input string to comma delimited format: '"'||regexp_replace(..., '([^0-9]+)', '","\1","')||'"'
  • Convert csv to the list of items via xmltable
  • Augment numeric-like items so string sort works properly
  • Use length(length(num))||length(num)||numinstead of lpad(num, 10, '0')as the latter is less compact and does not support 11+ digit numbers. Note:
  • 从所有数字中去除前导零,因此021和之间排序3regexp_replace(val, '(^|\D)0+(\d+)', '\1\2')。注:这可能会导致“意外”的排序10.02> 10.1(因为02被转化成2),但没有一个统一的答案如何之类的东西10.02.03应该进行排序
  • 转换"""带引号的文本可以正常工作
  • 将输入字符串转换为逗号分隔格式: '"'||regexp_replace(..., '([^0-9]+)', '","\1","')||'"'
  • 通过以下方式将 csv 转换为项目列表 xmltable
  • 增加类似数字的项目,以便字符串排序正常工作
  • 使用length(length(num))||length(num)||num代替,lpad(num, 10, '0')因为后者不太紧凑并且不支持 11+ 位数字。笔记:

Response time is something like 3-4 seconds for sorting list of 1000 random strings of length 30 (the generation of the random strings takes 0.2 sec itself). The main time consumer is xmltablethat splits text into rows. If using PL/SQL instead of xmltableto split string into rows the response time reduces to 0.4sec for the same 1000 rows.

对于长度为 30 的 1000 个随机字符串的排序列表,响应时间大约为 3-4 秒(随机字符串的生成本身需要 0.2 秒)。主要的时间消费者是xmltable将文本拆分成行。如果使用 PL/SQL 而不是xmltable将字符串拆分为行,对于相同的 1000 行,响应时间减少到 0.4 秒。

The following query performs natural sort of 100 random alpha-numeric strings (note: it produces wrong results in Oracle 11.2.0.4 and it works in 12.1.0.2):

以下查询对 100 个随机字母数字字符串执行自然排序(注意:它在 Oracle 11.2.0.4 中产生错误结果,在 12.1.0.2 中有效):

select *
  from (
    select (select listagg(case when regexp_like(w, '^[0-9]')
                                then length(length(w))||length(w)||w else w
                           end
                   ) within group (order by ord)
              from xmltable(t.csv columns w varchar2(4000) path '.'
                                        , ord for ordinality) q
           ) order_by
         , t.*
    from (
           select '"'||regexp_replace(replace(
                                          regexp_replace(val, '(^|\D)0+(\d+)', '')
                                        , '"', '""')
                                    , '([^0-9]+)', '","","')||'"' csv
                , t.*
           from (
                  select dbms_random.string('X', 30) val from xmltable('1 to 100')
                ) t
         ) t
  ) t
order by order_by;

The fun part is this order bycan be expressed without subqueries, so it is a handy tool to make your reviewer crazy (it works in both 11.2.0.4 and 12.1.0.2):

有趣的是这order by可以在没有子查询的情况下表达,所以它是一个让你的评论者疯狂的方便工具(它在 11.2.0.4 和 12.1.0.2 中都有效):

select *
  from (select dbms_random.string('X', 30) val from xmltable('1 to 100')) t
 order by (
   select listagg(case when regexp_like(w, '^[0-9]')
                       then length(length(w))||length(w)||w else w
                  end
          ) within group (order by ord)
     from xmltable('$X'
            passing xmlquery(('"'||regexp_replace(replace(
                                                     regexp_replace(t.val, '(^|\D)0+(\d+)', '')
                                                   , '"', '""')
                                                , '([^0-9]+)', '","","')||'"')
                             returning sequence
                    ) as X
            columns w varchar2(4000) path '.', ord for ordinality) q
);