PostgreSQL ORDER BY 问题 - 自然排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9173558/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PostgreSQL ORDER BY issue - natural sort
提问by Mithun Sreedharan
I've got a Postgres ORDER BYissue with the following table:
我有ORDER BY下表的 Postgres问题:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
To insert a new record to the table,
要向表中插入新记录,
- I select the last record with
SELECT * FROM employees ORDER BY em_code DESC - Strip alphabets from em_code usiging reg exp and store in
ec_alpha - Cast the remating part to integer
ec_num - Increment by one
ec_num++ - Pad with sufficient zeors and prefix
ec_alphaagain
- 我选择最后一条记录
SELECT * FROM employees ORDER BY em_code DESC - 使用 reg exp 从 em_code 中去除字母并存储在
ec_alpha - 将重新匹配部分转换为整数
ec_num - 加一
ec_num++ - 填充足够的 zeors 并
ec_alpha再次添加前缀
When em_codereaches EM1000, the above algorithm fails.
当em_code达到 EM1000 时,上述算法失败。
First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code, breaking the unique key constraint.
第一步将返回 EM999 而不是 EM1000,它将再次生成 EM1000 作为新的em_code,打破唯一键约束。
Any idea how to select EM1000?
知道如何选择 EM1000 吗?
采纳答案by Erwin Brandstetter
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9. You could solve it like this:
原因是字符串按字母顺序排序(而不是像你想要的那样按数字排序)和 1 在 9 之前排序。你可以这样解决它:
SELECT * FROM employees ORDER BY substring(em_code, 3)::int DESC
It would be more efficient to drop the redundant 'EM' from your em_code- if you can - and save an integer number to begin with.
em_code如果可以的话,从您的(如果可以)中删除多余的“EM”并保存一个整数会更有效。
Additional answer to question in comment
对评论中问题的补充回答
To strip any and all non-digits from a string:
从字符串中去除任何和所有非数字:
SELECT regexp_replace(em_code, E'\D','','g')
FROM employees
\Dis the regular expression class-shorthandfor "non-digits".'g'as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
\D是“非数字”的正则表达式类简写。'g'因为第四个参数是“全局”开关,用于将替换应用于字符串中的每个匹配项,而不仅仅是第一个。
So I replace every non-digit with the empty string distilling solely digits from the string.
所以我用空字符串替换每个非数字,只从字符串中提取数字。
回答by Jonathan
One approach you can take is to create a naturalsortfunction for this. Here's an example, written by Postgres legend RhodiumToad.
您可以采取的一种方法是为此创建一个naturalsort函数。这是一个例子,由 Postgres 传奇RhodiumToad 编写。
create or replace function naturalsort(text)
returns bytea language sql immutable strict as $f$
select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
from regexp_matches(, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;
Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
来源:http: //www.rhodiumtoad.org.uk/junk/naturalsort.sql
To use it simply call the function in your order by:
要使用它,只需通过以下方式按您的顺序调用该函数:
SELECT * FROM employees ORDER BY naturalsort(em_code) DESC
回答by sagneta
This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:
这总是出现在问题和我自己的发展中,我终于厌倦了这样做的棘手方法。我终于崩溃了,将它实现为 PostgreSQL 扩展:
https://github.com/Bjond/pg_natural_sort_order
https://github.com/Bjond/pg_natural_sort_order
It's free to use, MIT license.
它是免费使用的,MIT 许可证。
Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.
基本上它只是标准化字符串中的数字(零前置数字),这样您就可以创建一个索引列以进行全速排序。自述文件解释了。
The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.
优点是您可以让触发器来完成工作,而不是您的应用程序代码。它将在 PostgreSQL 服务器上以机器速度计算,并且添加列的迁移变得简单快捷。
回答by ykhlef hamza
you can use just this line "ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"
你可以只使用这一行“ORDER BY length(substring(em_code FROM '[0-9]+')), em_code”
回答by Craig Ringer
I wrote about this in detail in this related question:
我在这个相关问题中详细描述了这一点:
Humanized or natural number sorting of mixed word-and-number strings
(I'm posting this answer as a useful cross-reference only, so it's community wiki).
(我将此答案仅作为有用的交叉参考发布,因此它是社区维基)。
回答by Christian
I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.
我想到了另一种方法,它比填充使用更少的数据库存储,并且比动态计算节省时间。
https://stackoverflow.com/a/47522040/935122
https://stackoverflow.com/a/47522040/935122
I've also put it on GitHub
我也把它放在GitHub上
回答by Justin L.
I came up with something slightly different.
我想出了一些稍微不同的东西。
The basic idea is to create an array of tuples (integer, string)and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.
基本思想是创建一个元组数组,(integer, string)然后按这些进行排序。幻数 2147483647 是 int32_max,用于将字符串排在数字之后。
ORDER BY ARRAY(
SELECT ROW(
CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
match[2]
)
FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
AS match
)

