MySQL MySQL中的自然排序
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/153633/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Natural Sort in MySQL
提问by BlaM
Is there an elegant way to have performant, natural sorting in a MySQL database?
有没有一种优雅的方式在 MySQL 数据库中进行高性能、自然的排序?
For example if I have this data set:
例如,如果我有这个数据集:
- Final Fantasy
- Final Fantasy 4
- Final Fantasy 10
- Final Fantasy 12
- Final Fantasy 12: Chains of Promathia
- Final Fantasy Adventure
- Final Fantasy Origins
- Final Fantasy Tactics
- 最终幻想
- 最终幻想4
- 最终幻想10
- 最终幻想12
- 最终幻想12:普罗玛西亚之链
- 最终幻想冒险
- 最终幻想起源
- 最终幻想战术
Any other elegantsolution than to split up the games' names into their components
除了将游戏名称拆分为组件之外的任何其他优雅解决方案
- Title: "Final Fantasy"
- Number: "12"
- Subtitle: "Chains of Promathia"
- 书名:《最终幻想》
- 数字:“12”
- 副标题:《普罗玛西亚之链》
to make sure that they come out in the right order? (10 after 4, not before 2).
以确保它们以正确的顺序出现?(4 之后的 10,而不是 2 之前)。
Doing so is a pain in the a** because every now and then there's another game that breaks that mechanism of parsing the game title (e.g. "Warhammer 40,000", "James Bond 007")
这样做是一种痛苦,因为时不时会有另一款游戏打破解析游戏标题的机制(例如“战锤 40,000”、“詹姆斯邦德 007”)
采纳答案by Michael Haren
I think this is why a lot of things are sorted by release date.
我想这就是为什么很多东西都是按发布日期排序的。
A solution could be to create another column in your table for the "SortKey". This could be a sanitized version of the title which conforms to a pattern you create for easy sorting or a counter.
解决方案可能是在表中为“SortKey”创建另一列。这可能是标题的清理版本,它符合您为轻松排序或计数器创建的模式。
回答by slotishtype
Here is a quick solution:
这是一个快速的解决方案:
SELECT alphanumeric,
integer
FROM sorting_test
ORDER BY LENGTH(alphanumeric), alphanumeric
回答by BlaM
Just found this:
刚刚发现这个:
SELECT names FROM your_table ORDER BY games + 0 ASC
Does a natural sort when the numbers are at the front, might work for middle as well.
当数字在前面时进行自然排序,也可能适用于中间。
回答by Richard Toth
Same function as posted by @plalx, but rewritten to MySQL:
与@plalx 发布的功能相同,但重写为 MySQL:
DROP FUNCTION IF EXISTS `udf_FirstNumberPos`;
DELIMITER ;;
CREATE FUNCTION `udf_FirstNumberPos` (`instring` varchar(4000))
RETURNS int
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
DECLARE position int;
DECLARE tmp_position int;
SET position = 5000;
SET tmp_position = LOCATE('0', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('1', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('2', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('3', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('4', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('5', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('6', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('7', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('8', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
SET tmp_position = LOCATE('9', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
IF (position = 5000) THEN RETURN 0; END IF;
RETURN position;
END
;;
DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50))
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
DECLARE sortString varchar(4000);
DECLARE numStartIndex int;
DECLARE numEndIndex int;
DECLARE padLength int;
DECLARE totalPadLength int;
DECLARE i int;
DECLARE sameOrderCharsLen int;
SET totalPadLength = 0;
SET instring = TRIM(instring);
SET sortString = instring;
SET numStartIndex = udf_FirstNumberPos(instring);
SET numEndIndex = 0;
SET i = 1;
SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);
WHILE (i <= sameOrderCharsLen) DO
SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
SET i = i + 1;
END WHILE;
WHILE (numStartIndex <> 0) DO
SET numStartIndex = numStartIndex + numEndIndex;
SET numEndIndex = numStartIndex;
WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
SET numEndIndex = numEndIndex + 1;
END WHILE;
SET numEndIndex = numEndIndex - 1;
SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);
IF padLength < 0 THEN
SET padLength = 0;
END IF;
SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));
SET totalPadLength = totalPadLength + padLength;
SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
END WHILE;
RETURN sortString;
END
;;
Usage:
用法:
SELECT name FROM products ORDER BY udf_NaturalSortFormat(name, 10, ".")
回答by plalx
I've written this function for MSSQL 2000a while ago:
不久前,我为MSSQL 2000编写了此函数:
/**
* Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
*
* @author Alexandre Potvin Latreille (plalx)
* @param {nvarchar(4000)} string The formatted string.
* @param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
* @param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
*
* @return {nvarchar(4000)} A string for natural sorting.
* Example of use:
*
* SELECT Name FROM TableA ORDER BY Name
* TableA (unordered) TableA (ordered)
* ------------ ------------
* ID Name ID Name
* 1. A1. 1. A1-1.
* 2. A1-1. 2. A1.
* 3. R1 --> 3. R1
* 4. R11 4. R11
* 5. R2 5. R2
*
*
* As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
* We can use this function to fix this.
*
* SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
* TableA (unordered) TableA (ordered)
* ------------ ------------
* ID Name ID Name
* 1. A1. 1. A1.
* 2. A1-1. 2. A1-1.
* 3. R1 --> 3. R1
* 4. R11 4. R2
* 5. R2 5. R11
*/
CREATE FUNCTION dbo.udf_NaturalSortFormat(
@string nvarchar(4000),
@numberLength int = 10,
@sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
DECLARE @sortString varchar(4000),
@numStartIndex int,
@numEndIndex int,
@padLength int,
@totalPadLength int,
@i int,
@sameOrderCharsLen int;
SELECT
@totalPadLength = 0,
@string = RTRIM(LTRIM(@string)),
@sortString = @string,
@numStartIndex = PATINDEX('%[0-9]%', @string),
@numEndIndex = 0,
@i = 1,
@sameOrderCharsLen = LEN(@sameOrderChars);
-- Replace all char that has to have the same order by a space.
WHILE (@i <= @sameOrderCharsLen)
BEGIN
SET @sortString = REPLACE(@sortString, SUBSTRING(@sameOrderChars, @i, 1), ' ');
SET @i = @i + 1;
END
-- Pad numbers with zeros.
WHILE (@numStartIndex <> 0)
BEGIN
SET @numStartIndex = @numStartIndex + @numEndIndex;
SET @numEndIndex = @numStartIndex;
WHILE(PATINDEX('[0-9]', SUBSTRING(@string, @numEndIndex, 1)) = 1)
BEGIN
SET @numEndIndex = @numEndIndex + 1;
END
SET @numEndIndex = @numEndIndex - 1;
SET @padLength = @numberLength - (@numEndIndex + 1 - @numStartIndex);
IF @padLength < 0
BEGIN
SET @padLength = 0;
END
SET @sortString = STUFF(
@sortString,
@numStartIndex + @totalPadLength,
0,
REPLICATE('0', @padLength)
);
SET @totalPadLength = @totalPadLength + @padLength;
SET @numStartIndex = PATINDEX('%[0-9]%', RIGHT(@string, LEN(@string) - @numEndIndex));
END
RETURN @sortString;
END
GO
回答by ConroyP
MySQL doesn't allow this sort of "natural sorting", so it looks like the best way to get what you're after is to split your data set up as you've described above (separate id field, etc), or failing that, perform a sort based on a non-title element, indexed element in your db (date, inserted id in the db, etc).
MySQL 不允许这种“自然排序”,因此看起来获得所需内容的最佳方法是按照上述方式拆分数据设置(单独的 id 字段等),或者失败也就是说,根据非标题元素、数据库中的索引元素(日期、数据库中插入的 id 等)执行排序。
Having the db do the sorting for you is almost always going to be quicker than reading large data sets into your programming language of choice and sorting it there, so if you've any control at all over the db schema here, then look at adding easily-sorted fields as described above, it'll save you a lot of hassle and maintenance in the long run.
让 db 为您进行排序几乎总是比将大数据集读入您选择的编程语言并在那里进行排序要快,所以如果您对这里的 db 模式有任何控制,那么看看添加如上所述易于排序的字段,从长远来看,它将为您节省很多麻烦和维护。
Requests to add a "natural sort" come up from time to time on the MySQL bugsand discussion forums, and many solutions revolve around stripping out specific parts of your data and casting them for the ORDER BY
part of the query, e.g.
在MySQL 错误和论坛上不时出现添加“自然排序”的请求,许多解决方案都围绕着剥离数据的特定部分并将它们转换为ORDER BY
查询部分,例如
SELECT * FROM table ORDER BY CAST(mid(name, 6, LENGTH(c) -5) AS unsigned)
This sort of solution could just about be made to work on your Final Fantasy example above, but isn't particularly flexible and unlikely to extend cleanly to a dataset including, say, "Warhammer 40,000" and "James Bond 007" I'm afraid.
这种解决方案几乎可以用于上面的最终幻想示例,但不是特别灵活,不太可能干净地扩展到数据集,例如“战锤 40,000”和“詹姆斯邦德 007”恐怕.
回答by FilmJ
So, while I know that you have found a satisfactory answer, I was struggling with this problem for awhile, and we'd previously determined that it could not be done reasonably well in SQL and we were going to have to use javascript on a JSON array.
所以,虽然我知道你已经找到了一个令人满意的答案,但我在这个问题上挣扎了一段时间,我们之前已经确定它不能在 SQL 中很好地完成,我们将不得不在 JSON 上使用 javascript大批。
Here's how I solved it just using SQL. Hopefully this is helpful for others:
这是我仅使用 SQL 解决它的方法。希望这对其他人有帮助:
I had data such as:
我有数据,例如:
Scene 1 Scene 1A Scene 1B Scene 2A Scene 3 ... Scene 101 Scene XXA1 Scene XXA2
I actually didn't "cast" things though I suppose that may also have worked.
我实际上并没有“投射”东西,尽管我认为这也可能奏效。
I first replaced the parts that were unchanging in the data, in this case "Scene ", and then did a LPAD to line things up. This seems to allow pretty well for the alpha strings to sort properly as well as the numbered ones.
我首先替换了数据中不变的部分,在这种情况下是“场景”,然后做了一个 LPAD 来排列。这似乎可以很好地让 alpha 字符串和编号字符串正确排序。
My ORDER BY
clause looks like:
我的ORDER BY
条款看起来像:
ORDER BY LPAD(REPLACE(`table`.`column`,'Scene ',''),10,'0')
Obviously this doesn't help with the original problem which was not so uniform - but I imagine this would probably work for many other related problems, so putting it out there.
显然这对不那么统一的原始问题没有帮助 - 但我想这可能适用于许多其他相关问题,所以把它放在那里。
回答by Luke Hoggett
Regarding the best response from Richard Toth https://stackoverflow.com/a/12257917/4052357
关于 Richard Toth 的最佳回应https://stackoverflow.com/a/12257917/4052357
Watch out for UTF8 encoded strings that contain 2byte (or more) characters and numbers e.g.
注意包含 2 字节(或更多)字符和数字的 UTF8 编码字符串,例如
12 南新宿
Using MySQL's LENGTH()
in udf_NaturalSortFormat
function will return the byte length of the string and be incorrect, instead use CHAR_LENGTH()
which will return the correct character length.
使用 MySQL 的LENGTH()
inudf_NaturalSortFormat
函数将返回字符串的字节长度并且不正确,而使用CHAR_LENGTH()
which 将返回正确的字符长度。
In my case using LENGTH()
caused queries to never complete and result in 100% CPU usage for MySQL
在我的情况下,使用LENGTH()
导致查询永远不会完成并导致 MySQL 100% 的 CPU 使用率
DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50))
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
DECLARE sortString varchar(4000);
DECLARE numStartIndex int;
DECLARE numEndIndex int;
DECLARE padLength int;
DECLARE totalPadLength int;
DECLARE i int;
DECLARE sameOrderCharsLen int;
SET totalPadLength = 0;
SET instring = TRIM(instring);
SET sortString = instring;
SET numStartIndex = udf_FirstNumberPos(instring);
SET numEndIndex = 0;
SET i = 1;
SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);
WHILE (i <= sameOrderCharsLen) DO
SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
SET i = i + 1;
END WHILE;
WHILE (numStartIndex <> 0) DO
SET numStartIndex = numStartIndex + numEndIndex;
SET numEndIndex = numStartIndex;
WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
SET numEndIndex = numEndIndex + 1;
END WHILE;
SET numEndIndex = numEndIndex - 1;
SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);
IF padLength < 0 THEN
SET padLength = 0;
END IF;
SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));
SET totalPadLength = totalPadLength + padLength;
SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
END WHILE;
RETURN sortString;
END
;;
p.s. I would have added this as a comment to the original but I don't have enough reputation (yet)
ps我会将此添加为原始评论,但我没有足够的声誉(还)
回答by JeeBee
Add a Sort Key (Rank) in your table.
ORDER BY rank
Utilise the "Release Date" column.
ORDER BY release_date
When extracting the data from SQL, make your object do the sorting, e.g., if extracting into a Set, make it a TreeSet, and make your data model implement Comparable and enact the natural sort algorithm here (insertion sort will suffice if you are using a language without collections) as you'll be reading the rows from SQL one by one as you create your model and insert it into the collection)
在表中添加排序键(排名)。
ORDER BY rank
利用“发布日期”列。
ORDER BY release_date
从SQL中提取数据时,让您的对象进行排序,例如,如果提取到Set中,则将其设为TreeSet,并使您的数据模型实现Comparable并在此处制定自然排序算法(如果您使用插入排序就足够了一种没有集合的语言),因为您将在创建模型并将其插入到集合中时从 SQL 中逐个读取行)
回答by tye
Add a field for "sort key" that has all strings of digits zero-padded to a fixed length and then sort on that field instead.
为“排序键”添加一个字段,该字段将所有数字字符串零填充到固定长度,然后对该字段进行排序。
If you might have long strings of digits, another method is to prepend the number of digits (fixed-width, zero-padded) to each string of digits. For example, if you won't have more than 99 digits in a row, then for "Super Blast 10 Ultra" the sort key would be "Super Blast 0210 Ultra".
如果您可能有很长的数字字符串,另一种方法是在每个数字字符串前面添加数字(固定宽度,零填充)的数量。例如,如果您连续的数字不超过 99,那么对于“Super Blast 10 Ultra”,排序键将是“Super Blast 0210 Ultra”。