如何从 MySQL 中的字符串中删除所有非字母数字字符?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6942973/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 20:45:01  来源:igfitidea点击:

How to remove all non-alpha numeric characters from a string in MySQL?

mysqlregexstringalphanumeric

提问by Dylan

I'm working on a routine that compares strings, but for better efficiency I need to remove all characters that are not letters or numbers.

我正在研究一个比较字符串的例程,但为了提高效率,我需要删除所有不是字母或数字的字符。

I'm using multiple REPLACEfunctions now, but maybe there is a faster and nicer solution ?

我现在正在使用多种REPLACE功能,但也许有更快更好的解决方案?

回答by Ryan Shillington

None of these answers worked for me. I had to create my own function called alphanum which stripped the chars for me:

这些答案都不适合我。我必须创建自己的名为 alphanum 的函数,它为我去除了字符:

DROP FUNCTION IF EXISTS alphanum; 
DELIMITER | 
CREATE FUNCTION alphanum( str CHAR(255) ) RETURNS CHAR(255) DETERMINISTIC
BEGIN 
  DECLARE i, len SMALLINT DEFAULT 1; 
  DECLARE ret CHAR(255) DEFAULT ''; 
  DECLARE c CHAR(1);
  IF str IS NOT NULL THEN 
    SET len = CHAR_LENGTH( str ); 
    REPEAT 
      BEGIN 
        SET c = MID( str, i, 1 ); 
        IF c REGEXP '[[:alnum:]]' THEN 
          SET ret=CONCAT(ret,c); 
        END IF; 
        SET i = i + 1; 
      END; 
    UNTIL i > len END REPEAT; 
  ELSE
    SET ret='';
  END IF;
  RETURN ret; 
END | 
DELIMITER ; 

Now I can do:

现在我可以这样做:

select 'This works finally!', alphanum('This works finally!');

and I get:

我得到:

+---------------------+---------------------------------+
| This works finally! | alphanum('This works finally!') |
+---------------------+---------------------------------+
| This works finally! | Thisworksfinally                |
+---------------------+---------------------------------+
1 row in set (0.00 sec)

Hurray!

欢呼!

回答by Kevin Burton

From a performance point of view, (and on the assumption that you read more than you write)

从性能的角度来看,(假设你读的比写的多)

I think the best way would be to pre calculate and store a stripped version of the column, This way you do the transform less.

我认为最好的方法是预先计算并存储列的剥离版本,这样您就可以减少转换。

You can then put an index on the new column and get the database to do the work for you.

然后,您可以在新列上放置索引并让数据库为您完成工作。

回答by Johan

SELECT teststring REGEXP '[[:alnum:]]+';

SELECT * FROM testtable WHERE test REGEXP '[[:alnum:]]+'; 

See: http://dev.mysql.com/doc/refman/5.1/en/regexp.html
Scroll down to the section that says:[:character_class:]

请参阅:http: //dev.mysql.com/doc/refman/5.1/en/regexp.html
向下滚动到以下部分:[:character_class:]

If you want to manipulate strings the fastest way will be to use a str_udf, see:
https://github.com/hholzgra/mysql-udf-regexp

如果要操作字符串,最快的方法是使用 str_udf,请参阅:https:
//github.com/hholzgra/mysql-udf-regexp

回答by userlond

Straight and battletested solution for latin and cyrillic characters:

拉丁和西里尔字符的直接和经过实战测试的解决方案:

DELIMITER //

CREATE FUNCTION `remove_non_numeric_and_letters`(input TEXT)
  RETURNS TEXT
  BEGIN
    DECLARE output TEXT DEFAULT '';
    DECLARE iterator INT DEFAULT 1;
    WHILE iterator < (LENGTH(input) + 1) DO
      IF SUBSTRING(input, iterator, 1) IN
         ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'А', 'Б', 'В', 'Г', 'Д', 'Е', 'Ж', 'З', 'И', 'Й', 'К', 'Л', 'М', 'Н', 'О', 'П', 'Р', 'С', 'Т', 'У', 'Ф', 'Х', 'Ц', 'Ч', 'Ш', 'Щ', 'Ъ', 'Ы', 'Ь', 'Э', 'Ю', 'Я', 'а', 'б', 'в', 'г', 'д', 'е', 'ж', 'з', 'и', 'й', 'к', 'л', 'м', 'н', 'о', 'п', 'р', 'с', 'т', 'у', 'ф', 'х', 'ц', 'ч', 'ш', 'щ', 'ъ', 'ы', 'ь', 'э', 'ю', 'я')
      THEN
        SET output = CONCAT(output, SUBSTRING(input, iterator, 1));
      END IF;
      SET iterator = iterator + 1;
    END WHILE;
    RETURN output;
  END //

DELIMITER ;

Usage:

用法:

-- outputs "hello12356"
SELECT remove_non_numeric_and_letters('hello - 12356-привет ""]')

回答by Alon Asulin

Based on the answer by Ryan Shillington, modified to work with strings longer than 255 characters and preserving spaces from the original string.

根据Ryan Shillington 的回答,修改为使用长度超过 255 个字符的字符串并保留原​​始字符串中的空格。

FYI there is lower(str)in the end.

仅供参考lower(str),最后有。

I used this to compare strings:

我用它来比较字符串:

DROP FUNCTION IF EXISTS spacealphanum;
DELIMITER $$
CREATE FUNCTION `spacealphanum`( str TEXT ) RETURNS TEXT CHARSET utf8
BEGIN 
  DECLARE i, len SMALLINT DEFAULT 1; 
  DECLARE ret TEXT DEFAULT ''; 
  DECLARE c CHAR(1); 
  SET len = CHAR_LENGTH( str ); 
  REPEAT 
    BEGIN 
      SET c = MID( str, i, 1 ); 
      IF c REGEXP '[[:alnum:]]' THEN 
        SET ret=CONCAT(ret,c); 
      ELSEIF  c = ' ' THEN
          SET ret=CONCAT(ret," ");
      END IF; 
      SET i = i + 1; 
    END; 
  UNTIL i > len END REPEAT; 
  SET ret = lower(ret);
  RETURN ret; 
  END $$
  DELIMITER ;

回答by vdd

The fastest way I was able to find (and using ) is with convert().

我能够找到(并使用)的最快方法是使用convert()。

from Doc. CONVERT() with USING is used to convert data between different character sets.

来自 Doc。CONVERT() 和 USING 用于在不同字符集之间转换数据。

Example:

例子:

convert(string USING ascii)

In your case the right character setwill be self defined

在您的情况下,正确的字符集将是自定义的

NOTE from Doc. The USINGform of CONVERT()is available as of 4.1.0.

来自 Doc 的注释。的USING形式CONVERT()4.1.0 开始可用。

回答by Артур Курицын

I have written this UDF. However, it only trims special characters at the beginning of the string. It also converts the string to lower case. You can update this function if desired.

我已经写了这个 UDF。但是,它只修剪字符串开头的特殊字符。它还将字符串转换为小写。如果需要,您可以更新此功能。

DELIMITER //

DROP FUNCTION IF EXISTS DELETE_DOUBLE_SPACES//

CREATE FUNCTION DELETE_DOUBLE_SPACES ( title VARCHAR(250) )
RETURNS VARCHAR(250) DETERMINISTIC
BEGIN
    DECLARE result VARCHAR(250);
    SET result = REPLACE( title, '  ', ' ' );
    WHILE (result <> title) DO 
        SET title = result;
        SET result = REPLACE( title, '  ', ' ' );
    END WHILE;
    RETURN result;
END//

DROP FUNCTION IF EXISTS LFILTER//

CREATE FUNCTION LFILTER ( title VARCHAR(250) )
RETURNS VARCHAR(250) DETERMINISTIC
BEGIN
    WHILE (1=1) DO
        IF(  ASCII(title) BETWEEN ASCII('a') AND ASCII('z')
            OR ASCII(title) BETWEEN ASCII('A') AND ASCII('Z')
            OR ASCII(title) BETWEEN ASCII('0') AND ASCII('9')
        ) THEN
            SET title = LOWER( title );
            SET title = REPLACE(
                REPLACE(
                    REPLACE(
                        title,
                        CHAR(10), ' '
                    ),
                    CHAR(13), ' '
                ) ,
                CHAR(9), ' '
            );
            SET title = DELETE_DOUBLE_SPACES( title );
            RETURN title;
        ELSE
            SET title = SUBSTRING( title, 2 );          
        END IF;
    END WHILE;
END//
DELIMITER ;

SELECT LFILTER(' !@#$%^&*()_+1a    b');

Also, you could use regular expressions but this requires installing a MySql extension.

此外,您可以使用正则表达式,但这需要安装 MySql 扩展。

回答by Abdel

Be careful, characters like ' or ? are considered as alpha by MySQL. It better to use something like :

小心,像 ' 或 ? 被 MySQL 视为 alpha。最好使用类似的东西:

IF c BETWEEN 'a' AND 'z' OR c BETWEEN 'A' AND 'Z' OR c BETWEEN '0' AND '9' OR c = '-' THEN

IF c BETWEEN 'a' AND 'z' OR c BETWEEN 'A' AND 'Z' OR c BETWEEN '0' AND '9' OR c = '-' THEN

回答by Steve Chambers

This can be done with a regular expression replacer function I posted in another answerand have blogged about here. It may not be the most efficient solution possible and might look overkill for the job in hand - but like a Swiss army knife, it may come in useful for other reasons.

这可以用正则表达式替代品的功能我在贴进行另一种答案,并在博客这里。它可能不是最有效的解决方案,并且对于手头的工作来说可能看起来有些矫枉过正——但就像瑞士军刀一样,它可能因其他原因而派上用场。

It can be seen in action removing all non-alphanumeric characters in this Rextester online demo.

这个 Rexester 在线演示中,可以看到删除所有非字母数字字符的操作

SQL (excluding the function code for brevity):

SQL (为简洁起见,不包括函数代码)

SELECT txt,
       reg_replace(txt,
                   '[^a-zA-Z0-9]+',
                   '',
                   TRUE,
                   0,
                   0
                   ) AS `reg_replaced`
FROM test;

回答by michal.jakubeczy

Since MySQL 8.0 you can use regular expression to remove non alphanumeric characters from a string. There is method REGEXP_REPLACE

从 MySQL 8.0 开始,您可以使用正则表达式从字符串中删除非字母数字字符。有方法REGEXP_REPLACE

Here is the code to remove non-alphanumeric characters:

这是删除非字母数字字符的代码:

UPDATE {table} SET {column} = REGEXP_REPLACE({column}, '[^0-9a-zA-Z ]', '')