MySQL MySQL中的自然排序

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/153633/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 12:08:09  来源:igfitidea点击:

Natural Sort in MySQL

sqlmysqlsortingnatural-sort

提问by BlaM

Is there an elegant way to have performant, natural sorting in a MySQL database?

有没有一种优雅的方式在 MySQL 数据库中进行高性能、自然的排序?

For example if I have this data set:

例如,如果我有这个数据集:

  • Final Fantasy
  • Final Fantasy 4
  • Final Fantasy 10
  • Final Fantasy 12
  • Final Fantasy 12: Chains of Promathia
  • Final Fantasy Adventure
  • Final Fantasy Origins
  • Final Fantasy Tactics
  • 最终幻想
  • 最终幻想4
  • 最终幻想10
  • 最终幻想12
  • 最终幻想12:普罗玛西亚之链
  • 最终幻想冒险
  • 最终幻想起源
  • 最终幻想战术

Any other elegantsolution than to split up the games' names into their components

除了将游戏名称拆分为组件之外的任何其他优雅解决方案

  • Title: "Final Fantasy"
  • Number: "12"
  • Subtitle: "Chains of Promathia"
  • 书名:《最终幻想》
  • 数字:“12”
  • 副标题:《普罗玛西亚之链》

to make sure that they come out in the right order? (10 after 4, not before 2).

以确保它们以正确的顺序出现?(4 之后的 10,而不是 2 之前)。

Doing so is a pain in the a** because every now and then there's another game that breaks that mechanism of parsing the game title (e.g. "Warhammer 40,000", "James Bond 007")

这样做是一种痛苦,因为时不时会有另一款游戏打破解析游戏标题的机制(例如“战锤 40,000”、“詹姆斯邦德 007”)

采纳答案by Michael Haren

I think this is why a lot of things are sorted by release date.

我想这就是为什么很多东西都是按发布日期排序的。

A solution could be to create another column in your table for the "SortKey". This could be a sanitized version of the title which conforms to a pattern you create for easy sorting or a counter.

解决方案可能是在表中为“SortKey”创建另一列。这可能是标题的清理版本,它符合您为轻松排序或计数器创建的模式。

回答by slotishtype

Here is a quick solution:

这是一个快速的解决方案:

SELECT alphanumeric, 
       integer
FROM sorting_test
ORDER BY LENGTH(alphanumeric), alphanumeric

回答by BlaM

Just found this:

刚刚发现这个:

SELECT names FROM your_table ORDER BY games + 0 ASC

Does a natural sort when the numbers are at the front, might work for middle as well.

当数字在前面时进行自然排序,也可能适用于中间。

回答by Richard Toth

Same function as posted by @plalx, but rewritten to MySQL:

与@plalx 发布的功能相同,但重写为 MySQL:

DROP FUNCTION IF EXISTS `udf_FirstNumberPos`;
DELIMITER ;;
CREATE FUNCTION `udf_FirstNumberPos` (`instring` varchar(4000)) 
RETURNS int
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE position int;
    DECLARE tmp_position int;
    SET position = 5000;
    SET tmp_position = LOCATE('0', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF; 
    SET tmp_position = LOCATE('1', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('2', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('3', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('4', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('5', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('6', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('7', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('8', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;
    SET tmp_position = LOCATE('9', instring); IF (tmp_position > 0 AND tmp_position < position) THEN SET position = tmp_position; END IF;

    IF (position = 5000) THEN RETURN 0; END IF;
    RETURN position;
END
;;

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;

Usage:

用法:

SELECT name FROM products ORDER BY udf_NaturalSortFormat(name, 10, ".")

回答by plalx

I've written this function for MSSQL 2000a while ago:

不久前,我为MSSQL 2000编写了此函数:

/**
 * Returns a string formatted for natural sorting. This function is very useful when having to sort alpha-numeric strings.
 *
 * @author Alexandre Potvin Latreille (plalx)
 * @param {nvarchar(4000)} string The formatted string.
 * @param {int} numberLength The length each number should have (including padding). This should be the length of the longest number. Defaults to 10.
 * @param {char(50)} sameOrderChars A list of characters that should have the same order. Ex: '.-/'. Defaults to empty string.
 *
 * @return {nvarchar(4000)} A string for natural sorting.
 * Example of use: 
 * 
 *      SELECT Name FROM TableA ORDER BY Name
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1-1.       
 *  2.  A1-1.                   2.  A1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R11
 *  5.  R2                  5.  R2
 *
 *  
 *  As we can see, humans would expect A1., A1-1., R1, R2, R11 but that's not how SQL is sorting it.
 *  We can use this function to fix this.
 *
 *      SELECT Name FROM TableA ORDER BY dbo.udf_NaturalSortFormat(Name, default, '.-')
 *  TableA (unordered)              TableA (ordered)
 *  ------------                    ------------
 *  ID  Name                    ID  Name
 *  1.  A1.                 1.  A1.     
 *  2.  A1-1.                   2.  A1-1.
 *  3.  R1      -->         3.  R1
 *  4.  R11                 4.  R2
 *  5.  R2                  5.  R11
 */
CREATE FUNCTION dbo.udf_NaturalSortFormat(
    @string nvarchar(4000),
    @numberLength int = 10,
    @sameOrderChars char(50) = ''
)
RETURNS varchar(4000)
AS
BEGIN
    DECLARE @sortString varchar(4000),
        @numStartIndex int,
        @numEndIndex int,
        @padLength int,
        @totalPadLength int,
        @i int,
        @sameOrderCharsLen int;

    SELECT 
        @totalPadLength = 0,
        @string = RTRIM(LTRIM(@string)),
        @sortString = @string,
        @numStartIndex = PATINDEX('%[0-9]%', @string),
        @numEndIndex = 0,
        @i = 1,
        @sameOrderCharsLen = LEN(@sameOrderChars);

    -- Replace all char that has to have the same order by a space.
    WHILE (@i <= @sameOrderCharsLen)
    BEGIN
        SET @sortString = REPLACE(@sortString, SUBSTRING(@sameOrderChars, @i, 1), ' ');
        SET @i = @i + 1;
    END

    -- Pad numbers with zeros.
    WHILE (@numStartIndex <> 0)
    BEGIN
        SET @numStartIndex = @numStartIndex + @numEndIndex;
        SET @numEndIndex = @numStartIndex;

        WHILE(PATINDEX('[0-9]', SUBSTRING(@string, @numEndIndex, 1)) = 1)
        BEGIN
            SET @numEndIndex = @numEndIndex + 1;
        END

        SET @numEndIndex = @numEndIndex - 1;

        SET @padLength = @numberLength - (@numEndIndex + 1 - @numStartIndex);

        IF @padLength < 0
        BEGIN
            SET @padLength = 0;
        END

        SET @sortString = STUFF(
            @sortString,
            @numStartIndex + @totalPadLength,
            0,
            REPLICATE('0', @padLength)
        );

        SET @totalPadLength = @totalPadLength + @padLength;
        SET @numStartIndex = PATINDEX('%[0-9]%', RIGHT(@string, LEN(@string) - @numEndIndex));
    END

    RETURN @sortString;
END

GO

回答by ConroyP

MySQL doesn't allow this sort of "natural sorting", so it looks like the best way to get what you're after is to split your data set up as you've described above (separate id field, etc), or failing that, perform a sort based on a non-title element, indexed element in your db (date, inserted id in the db, etc).

MySQL 不允许这种“自然排序”,因此看起来获得所需内容的最佳方法是按照上述方式拆分数据设置(单独的 id 字段等),或者失败也就是说,根据非标题元素、数据库中的索引元素(日期、数据库中插入的 id 等)执行排序。

Having the db do the sorting for you is almost always going to be quicker than reading large data sets into your programming language of choice and sorting it there, so if you've any control at all over the db schema here, then look at adding easily-sorted fields as described above, it'll save you a lot of hassle and maintenance in the long run.

让 db 为您进行排序几乎总是比将大数据集读入您选择的编程语言并在那里进行排序要快,所以如果您对这里的 db 模式有任何控制,那么看看添加如上所述易于排序的字段,从长远来看,它将为您节省很多麻烦和维护。

Requests to add a "natural sort" come up from time to time on the MySQL bugsand discussion forums, and many solutions revolve around stripping out specific parts of your data and casting them for the ORDER BYpart of the query, e.g.

MySQL 错误论坛上不时出现添加“自然排序”的请求,许多解决方案都围绕着剥离数据的特定部分并将它们转换为ORDER BY查询部分,例如

SELECT * FROM table ORDER BY CAST(mid(name, 6, LENGTH(c) -5) AS unsigned) 

This sort of solution could just about be made to work on your Final Fantasy example above, but isn't particularly flexible and unlikely to extend cleanly to a dataset including, say, "Warhammer 40,000" and "James Bond 007" I'm afraid.

这种解决方案几乎可以用于上面的最终幻想示例,但不是特别灵活,不太可能干净地扩展到数据集,例如“战锤 40,000”和“詹姆斯邦德 007”恐怕.

回答by FilmJ

So, while I know that you have found a satisfactory answer, I was struggling with this problem for awhile, and we'd previously determined that it could not be done reasonably well in SQL and we were going to have to use javascript on a JSON array.

所以,虽然我知道你已经找到了一个令人满意的答案,但我在这个问题上挣扎了一段时间,我们之前已经确定它不能在 SQL 中很好地完成,我们将不得不在 JSON 上使用 javascript大批。

Here's how I solved it just using SQL. Hopefully this is helpful for others:

这是我仅使用 SQL 解决它的方法。希望这对其他人有帮助:

I had data such as:

我有数据,例如:

Scene 1
Scene 1A
Scene 1B
Scene 2A
Scene 3
...
Scene 101
Scene XXA1
Scene XXA2

I actually didn't "cast" things though I suppose that may also have worked.

我实际上并没有“投射”东西,尽管我认为这也可能奏效。

I first replaced the parts that were unchanging in the data, in this case "Scene ", and then did a LPAD to line things up. This seems to allow pretty well for the alpha strings to sort properly as well as the numbered ones.

我首先替换了数据中不变的部分,在这种情况下是“场景”,然后做了一个 LPAD 来排列。这似乎可以很好地让 alpha 字符串和编号字符串正确排序。

My ORDER BYclause looks like:

我的ORDER BY条款看起来像:

ORDER BY LPAD(REPLACE(`table`.`column`,'Scene ',''),10,'0')

Obviously this doesn't help with the original problem which was not so uniform - but I imagine this would probably work for many other related problems, so putting it out there.

显然这对不那么统一的原始问题没有帮助 - 但我想这可能适用于许多其他相关问题,所以把它放在那里。

回答by Luke Hoggett

Regarding the best response from Richard Toth https://stackoverflow.com/a/12257917/4052357

关于 Richard Toth 的最佳回应https://stackoverflow.com/a/12257917/4052357

Watch out for UTF8 encoded strings that contain 2byte (or more) characters and numbers e.g.

注意包含 2 字节(或更多)字符和数字的 UTF8 编码字符串,例如

12 南新宿

Using MySQL's LENGTH()in udf_NaturalSortFormatfunction will return the byte length of the string and be incorrect, instead use CHAR_LENGTH()which will return the correct character length.

使用 MySQL 的LENGTH()inudf_NaturalSortFormat函数将返回字符串的字节长度并且不正确,而使用CHAR_LENGTH()which 将返回正确的字符长度。

In my case using LENGTH()caused queries to never complete and result in 100% CPU usage for MySQL

在我的情况下,使用LENGTH()导致查询永远不会完成并导致 MySQL 100% 的 CPU 使用率

DROP FUNCTION IF EXISTS `udf_NaturalSortFormat`;
DELIMITER ;;
CREATE FUNCTION `udf_NaturalSortFormat` (`instring` varchar(4000), `numberLength` int, `sameOrderChars` char(50)) 
RETURNS varchar(4000)
LANGUAGE SQL
DETERMINISTIC
NO SQL
SQL SECURITY INVOKER
BEGIN
    DECLARE sortString varchar(4000);
    DECLARE numStartIndex int;
    DECLARE numEndIndex int;
    DECLARE padLength int;
    DECLARE totalPadLength int;
    DECLARE i int;
    DECLARE sameOrderCharsLen int;

    SET totalPadLength = 0;
    SET instring = TRIM(instring);
    SET sortString = instring;
    SET numStartIndex = udf_FirstNumberPos(instring);
    SET numEndIndex = 0;
    SET i = 1;
    SET sameOrderCharsLen = CHAR_LENGTH(sameOrderChars);

    WHILE (i <= sameOrderCharsLen) DO
        SET sortString = REPLACE(sortString, SUBSTRING(sameOrderChars, i, 1), ' ');
        SET i = i + 1;
    END WHILE;

    WHILE (numStartIndex <> 0) DO
        SET numStartIndex = numStartIndex + numEndIndex;
        SET numEndIndex = numStartIndex;

        WHILE (udf_FirstNumberPos(SUBSTRING(instring, numEndIndex, 1)) = 1) DO
            SET numEndIndex = numEndIndex + 1;
        END WHILE;

        SET numEndIndex = numEndIndex - 1;

        SET padLength = numberLength - (numEndIndex + 1 - numStartIndex);

        IF padLength < 0 THEN
            SET padLength = 0;
        END IF;

        SET sortString = INSERT(sortString, numStartIndex + totalPadLength, 0, REPEAT('0', padLength));

        SET totalPadLength = totalPadLength + padLength;
        SET numStartIndex = udf_FirstNumberPos(RIGHT(instring, CHAR_LENGTH(instring) - numEndIndex));
    END WHILE;

    RETURN sortString;
END
;;

p.s. I would have added this as a comment to the original but I don't have enough reputation (yet)

ps我会将此添加为原始评论,但我没有足够的声誉(还)

回答by JeeBee

  1. Add a Sort Key (Rank) in your table. ORDER BY rank

  2. Utilise the "Release Date" column. ORDER BY release_date

  3. When extracting the data from SQL, make your object do the sorting, e.g., if extracting into a Set, make it a TreeSet, and make your data model implement Comparable and enact the natural sort algorithm here (insertion sort will suffice if you are using a language without collections) as you'll be reading the rows from SQL one by one as you create your model and insert it into the collection)

  1. 在表中添加排序键(排名)。 ORDER BY rank

  2. 利用“发布日期”列。 ORDER BY release_date

  3. 从SQL中提取数据时,让您的对象进行排序,例如,如果提取到Set中,则将其设为TreeSet,并使您的数据模型实现Comparable并在此处制定自然排序算法(如果您使用插入排序就足够了一种没有集合的语言),因为您将在创建模型并将其插入到集合中时从 SQL 中逐个读取行)

回答by tye

Add a field for "sort key" that has all strings of digits zero-padded to a fixed length and then sort on that field instead.

为“排序键”添加一个字段,该字段将所有数字字符串零填充到固定长度,然后对该字段进行排序。

If you might have long strings of digits, another method is to prepend the number of digits (fixed-width, zero-padded) to each string of digits. For example, if you won't have more than 99 digits in a row, then for "Super Blast 10 Ultra" the sort key would be "Super Blast 0210 Ultra".

如果您可能有很长的数字字符串,另一种方法是在每个数字字符串前面添加数字(固定宽度,零填充)的数量。例如,如果您连续的数字不超过 99,那么对于“Super Blast 10 Ultra”,排序键将是“Super Blast 0210 Ultra”。