PostgreSQL ORDER BY 问题 - 自然排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/9173558/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
PostgreSQL ORDER BY issue - natural sort
提问by Mithun Sreedharan
I've got a Postgres ORDER BY
issue with the following table:
我有ORDER BY
下表的 Postgres问题:
em_code name
EM001 AAA
EM999 BBB
EM1000 CCC
To insert a new record to the table,
要向表中插入新记录,
- I select the last record with
SELECT * FROM employees ORDER BY em_code DESC
- Strip alphabets from em_code usiging reg exp and store in
ec_alpha
- Cast the remating part to integer
ec_num
- Increment by one
ec_num++
- Pad with sufficient zeors and prefix
ec_alpha
again
- 我选择最后一条记录
SELECT * FROM employees ORDER BY em_code DESC
- 使用 reg exp 从 em_code 中去除字母并存储在
ec_alpha
- 将重新匹配部分转换为整数
ec_num
- 加一
ec_num++
- 填充足够的 zeors 并
ec_alpha
再次添加前缀
When em_code
reaches EM1000, the above algorithm fails.
当em_code
达到 EM1000 时,上述算法失败。
First step will return EM999 instead EM1000 and it will again generate EM1000 as new em_code
, breaking the unique key constraint.
第一步将返回 EM999 而不是 EM1000,它将再次生成 EM1000 作为新的em_code
,打破唯一键约束。
Any idea how to select EM1000?
知道如何选择 EM1000 吗?
采纳答案by Erwin Brandstetter
The reason is that the string sorts alphabetically (instead of numerically like you would want it) and 1 sorts before 9. You could solve it like this:
原因是字符串按字母顺序排序(而不是像你想要的那样按数字排序)和 1 在 9 之前排序。你可以这样解决它:
SELECT * FROM employees ORDER BY substring(em_code, 3)::int DESC
It would be more efficient to drop the redundant 'EM' from your em_code
- if you can - and save an integer number to begin with.
em_code
如果可以的话,从您的(如果可以)中删除多余的“EM”并保存一个整数会更有效。
Additional answer to question in comment
对评论中问题的补充回答
To strip any and all non-digits from a string:
从字符串中去除任何和所有非数字:
SELECT regexp_replace(em_code, E'\D','','g')
FROM employees
\D
is the regular expression class-shorthandfor "non-digits".'g'
as 4th parameter is the "globally" switch to apply the replacement to every occurrence in the string, not just the first.
\D
是“非数字”的正则表达式类简写。'g'
因为第四个参数是“全局”开关,用于将替换应用于字符串中的每个匹配项,而不仅仅是第一个。
So I replace every non-digit with the empty string distilling solely digits from the string.
所以我用空字符串替换每个非数字,只从字符串中提取数字。
回答by Jonathan
One approach you can take is to create a naturalsort
function for this. Here's an example, written by Postgres legend RhodiumToad.
您可以采取的一种方法是为此创建一个naturalsort
函数。这是一个例子,由 Postgres 传奇RhodiumToad 编写。
create or replace function naturalsort(text)
returns bytea language sql immutable strict as $f$
select string_agg(convert_to(coalesce(r[2], length(length(r[1])::text) || length(r[1])::text || r[1]), 'SQL_ASCII'),'\x00')
from regexp_matches(, '0*([0-9]+)|([^0-9]+)', 'g') r;
$f$;
Source: http://www.rhodiumtoad.org.uk/junk/naturalsort.sql
来源:http: //www.rhodiumtoad.org.uk/junk/naturalsort.sql
To use it simply call the function in your order by:
要使用它,只需通过以下方式按您的顺序调用该函数:
SELECT * FROM employees ORDER BY naturalsort(em_code) DESC
回答by sagneta
This always comes up in questions and in my own development and I finally tired of tricky ways of doing this. I finally broke down and implemented it as a PostgreSQL extension:
这总是出现在问题和我自己的发展中,我终于厌倦了这样做的棘手方法。我终于崩溃了,将它实现为 PostgreSQL 扩展:
https://github.com/Bjond/pg_natural_sort_order
https://github.com/Bjond/pg_natural_sort_order
It's free to use, MIT license.
它是免费使用的,MIT 许可证。
Basically it just normalizes the numerics (zero pre-pending numerics) within strings such that you can create an index column for full-speed sorting au naturel. The readme explains.
基本上它只是标准化字符串中的数字(零前置数字),这样您就可以创建一个索引列以进行全速排序。自述文件解释了。
The advantage is you can have a trigger do the work and not your application code. It will be calculated at machine-speed on the PostgreSQL server and migrations adding columns become simple and fast.
优点是您可以让触发器来完成工作,而不是您的应用程序代码。它将在 PostgreSQL 服务器上以机器速度计算,并且添加列的迁移变得简单快捷。
回答by ykhlef hamza
you can use just this line "ORDER BY length(substring(em_code FROM '[0-9]+')), em_code"
你可以只使用这一行“ORDER BY length(substring(em_code FROM '[0-9]+')), em_code”
回答by Craig Ringer
I wrote about this in detail in this related question:
我在这个相关问题中详细描述了这一点:
Humanized or natural number sorting of mixed word-and-number strings
(I'm posting this answer as a useful cross-reference only, so it's community wiki).
(我将此答案仅作为有用的交叉参考发布,因此它是社区维基)。
回答by Christian
I thought about another way of doing this that uses less db storage than padding and saves time than calculating on the fly.
我想到了另一种方法,它比填充使用更少的数据库存储,并且比动态计算节省时间。
https://stackoverflow.com/a/47522040/935122
https://stackoverflow.com/a/47522040/935122
I've also put it on GitHub
我也把它放在GitHub上
回答by Justin L.
I came up with something slightly different.
我想出了一些稍微不同的东西。
The basic idea is to create an array of tuples (integer, string)
and then order by these. The magic number 2147483647 is int32_max, used so that strings are sorted after numbers.
基本思想是创建一个元组数组,(integer, string)
然后按这些进行排序。幻数 2147483647 是 int32_max,用于将字符串排在数字之后。
ORDER BY ARRAY(
SELECT ROW(
CAST(COALESCE(NULLIF(match[1], ''), '2147483647') AS INTEGER),
match[2]
)
FROM REGEXP_MATCHES(col_to_sort_by, '(\d*)|(\D*)', 'g')
AS match
)