PostgreSQL ORDER BY 问题 - 自然排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/9173558/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-20 23:37:10  来源:igfitidea点击:

PostgreSQL ORDER BY issue - natural sort

sqlpostgresqltypessql-order-bynatural-sort

提问by Mithun Sreedharan

I've got a Postgres ORDER BYissue with the following table:

我有ORDER BY下表的 Postgres问题:

em_code  name
EM001    AAA
EM999    BBB
EM1000   CCC

To insert a new record to the table,

要向表中插入新记录,

  1. I select the last record with SELECT * FROM employees ORDER BY em_code DESC
  2. Strip alphabets from em_code usiging reg exp and store in ec_alpha
  3. Cast the remating part to integer ec_num
  4. Increment by one ec_num++
  5. Pad with sufficient zeors and prefix ec_alphaagain
  1. 我选择最后一条记录 SELECT * FROM employees ORDER BY em_code DESC
  2. 使用 reg exp 从 em_code 中去除字母并存储在 ec_alpha
  3. 将重新匹配部分转换为整数 ec_num
  4. 加一 ec_num++
  5. 填充足够的 zeors 并ec_alpha再次添加前缀

When em_codereaches EM1000, the above algorithm fails.

em_code达到 EM1000 时,上述算法失败。

First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.

第一步将返回 EM999 而不是 EM1000,它将再次生成 EM1000 作为新的em_code,打破唯一键约束。

Any idea how to select EM1000?

知道如何选择 EM1000 吗?

采纳答案by Erwin Brandstetter

The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9. You could solve it like this:

原因是字符串按字母顺序排序(而不是像你想要的那样按数字排序)和 1 在 9 之前排序。你可以这样解决它:

SELECT * FROM employees ORDER BY substring(em_code, 3)::int DESC

It would be more efficient to drop the redundant 'EM' from your em_code- if you can - and save an integer number to begin with.

em_code如果可以的话,从您的(如果可以)中删除多余的“EM”并保存一个整数会更有效。



Additional answer to question in comment

对评论中问题的补充回答

To strip any and all non-digits from a string:

从字符串中去除任何和所有非数字:

SELECT regexp_replace(em_code, E'\D','','g')
FROM employees

\Dis the regular expression class-shorthandfor "non-digits".
'g'as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.

\D是“非数字”的正则表达式类简写
'g'因为第四个参数是“全局”开关,用于将替换应用于字符串中的每个匹配项,而不仅仅是第一个。

So I replace every non-digit with the empty string distilling solely digits from the string.

所以我用空字符串替换每个非数字,只从字符串中提取数字。

回答by Jonathan

One approach you can take is to create a naturalsortfunction for this. Here's an example, written by Postgres legend RhodiumToad.

您可以采取的一种方法是为此创建一个naturalsort函数。这是一个例子,由 Postgres 传奇RhodiumToad 编写

create or replace function naturalsort(text)
    returns bytea language sql immutable strict as $f$
    select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
    from regexp_matches(, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;

Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql

来源:http: //www.rhodiumtoad.org.uk/junk/naturalsort.sql

To use it simply call the function in your order by:

要使用它,只需通过以下方式按您的顺序调用该函数:

SELECT * FROM employees ORDER BY naturalsort(em_code) DESC

回答by sagneta

This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:

这总是出现在问题和我自己的发展中,我终于厌倦了这样做的棘手方法。我终于崩溃了,将它实现为 PostgreSQL 扩展:

https://github.com/Bjond/pg_natural_sort_order

https://github.com/Bjond/pg_natural_sort_order

It's free to use, MIT license.

它是免费使用的,MIT 许可证。

Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.

基本上它只是标准化字符串中的数字(零前置数字),这样您就可以创建一个索引列以进行全速排序。自述文件解释了。

The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.

优点是您可以让触发器来完成工作,而不是您的应用程序代码。它将在 PostgreSQL 服务器上以机器速度计算,并且添加列的迁移变得简单快捷。

回答by ykhlef hamza

you can use just this line "ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"

你可以只使用这一行“ORDER BY length(substring(em_code FROM '[0-9]+')), em_code”

回答by Craig Ringer

I wrote about this in detail in this related question:

我在这个相关问题中详细描述了这一点:

Humanized or natural number sorting of mixed word-and-number strings

混合字数串的人性化或自然数排序

(I'm posting this answer as a useful cross-reference only, so it's community wiki).

(我将此答案仅作为有用的交叉参考发布,因此它是社区维基)。

回答by Christian

I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.

我想到了另一种方法,它比填充使用更少的数据库存储,并且比动态计算节省时间。

https://stackoverflow.com/a/47522040/935122

https://stackoverflow.com/a/47522040/935122

I've also put it on GitHub

我也把它放在GitHub上

https://github.com/ccsalway/dbNaturalSort

https://github.com/ccsalway/dbNaturalSort

回答by Justin L.

I came up with something slightly different.

我想出了一些稍微不同的东西。

The basic idea is to create an array of tuples (integer, string)and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.

基本思想是创建一个元组数组,(integer, string)然后按这些进行排序。幻数 2147483647 是 int32_max,用于将字符串排在数字之后。

  ORDER BY ARRAY(
    SELECT ROW(
      CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
      match[2]
    )
    FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
    AS match
  )