MySQL 从数据库字段中删除特殊字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/6066953/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 19:56:52  来源:igfitidea点击:

Remove special characters from a database field

mysqlsqldatabase

提问by MarathonStudios

I have a database with several thousand records, and I need to strip down one of the fields to ensure that it only contains certain characters (Alphanumeric, spaces, and single quotes). What SQL can I use to strip any other characters (such as slashes, etc) from that field in the whole database?

我有一个包含数千条记录的数据库,我需要剥离其中一个字段以确保它只包含某些字符(字母数字、空格和单引号)。我可以使用什么 SQL 从整个数据库中的该字段中删除任何其他字符(例如斜杠等)?

回答by Vinnie

update mytable
set FieldName = REPLACE(FieldName,'/','')

That's a good place to start.

这是一个很好的起点。

回答by DonaldSowell

The Replace() function is first choice. However, Special Characters can sometimes be tricky to write in a console. For those you can combine Replace with the Char() function.

Replace() 函数是首选。但是,在控制台中编写特殊字符有时会很棘手。对于那些,您可以将 Replace 与 Char() 函数结合使用。

e.g. removing

例如移除

Update products set description = replace(description, char(128), '');

You can find all the Ascii values here

你可以在这里找到所有的Ascii 值

Ideally you could do a regex to find all the special chars, but apparently that's not possible with MySQL.

理想情况下,您可以使用正则表达式来查找所有特殊字符,但显然MySQL 不可能做到这一点。

Beyond that, you'd need to run it through your favorite scripting language.

除此之外,您还需要通过您最喜欢的脚本语言来运行它。

回答by Juned Ansari

I have created simple function for this

我为此创建了简单的函数

DROP FUNCTION IF EXISTS `regex_replace`$$

CREATE FUNCTION `regex_replace`(pattern VARCHAR(1000),replacement VARCHAR(1000),original VARCHAR(1000)) RETURNS VARCHAR(1000) CHARSET utf8mb4
    DETERMINISTIC
BEGIN    
    DECLARE temp VARCHAR(1000); 
    DECLARE ch VARCHAR(1); 
    DECLARE i INT;
    SET i = 1;
    SET temp = '';
    IF original REGEXP pattern THEN 
        loop_label: LOOP 
            IF i>CHAR_LENGTH(original) THEN
                LEAVE loop_label;  
            END IF;

            SET ch = SUBSTRING(original,i,1);

            IF NOT ch REGEXP pattern THEN
                SET temp = CONCAT(temp,ch);
            ELSE
                SET temp = CONCAT(temp,replacement);
            END IF;

            SET i=i+1;
        END LOOP;
    ELSE
        SET temp = original;
    END IF;

    RETURN temp;
END

Usage example:

用法示例:

SELECT <field-name> AS NormalText, regex_replace('[^A-Za-z0-9 ]', '', <field-name>)AS RegexText FROM 
<table-name>

回答by Adams Biyi

This may also be useful.

这也可能有用。

First you have to know the character set of the database and / or of the table. For example, let us suppose you have a UTF-8 environment and you want to strip / remove symbols like circled registered symbols, circled copyright symbol, and registered trademark symbol from a field then search the internet via bing or yahoo or google for for the hex code values of these symbols in the UTF-8 system:

首先,您必须知道数据库和/或表的字符集。例如,让我们假设您有一个 UTF-8 环境,并且您想从字段中剥离/删除符号,如带圆圈的注册符号、带圆圈的版权符号和注册商标符号,然后通过 bing 或 yahoo 或 google 在互联网上搜索UTF-8 系统中这些符号的十六进制代码值:

Symbol                       Utf-8 Hex
=======                      =========
circled copyright              C2A9
circled registered             C2AE
Trademark (i.e., TM)           E284A2

Then your scrubbing select sql for field f1 from table t1, using the hex / unhex facility in conjunction with the replace function, will most likely look like this:

然后,使用十六进制/非十六进制工具和替换函数,从表 t1 中为字段 f1 擦洗选择 sql,很可能如下所示:

SELECT cast(unhex(replace(replace(replace(hex(f1),'C2A9',''),'C2AE',''),'E284A2','')) AS char) AS cleanf1 FROM t1 ;

SELECT cast(unhex(replace(replace(replace(hex(f1),'C2A9',''),'C2AE',''),'E284A2','')) AS char) AS cleanf1 FROM t1 ;

Above, note the original field to be scrubbed / cleansed is f1, the table is t1 and the output header is cleanf1. The "as char" casting is necessary because, w/o it, the mysql 5.5.8 on which I tested is returning blob. Hope this helps

上面,注意要清理/清理的原始字段是 f1,表是 t1,输出标题是 cleanf1。“as char”转换是必要的,因为没有它,我测试的 mysql 5.5.8 正在返回 blob。希望这可以帮助

回答by Adams Biyi

Elaborating on Vinnies answer... you can use the following (note the escaping in the last two statements...

详细说明 Vinnies 的答案......你可以使用以下内容(注意最后两个语句中的转义......

update table set column = REPLACE(column,"`","");
update table set column = REPLACE(column,"~","");
update table set column = REPLACE(column,"!","");
update table set column = REPLACE(column,"@","");
update table set column = REPLACE(column,"#","");
update table set column = REPLACE(column,"$","");
update table set column = REPLACE(column,"%","");
update table set column = REPLACE(column,"^","");
update table set column = REPLACE(column,"&","");
update table set column = REPLACE(column,"*","");
update table set column = REPLACE(column,"(","");
update table set column = REPLACE(column,")","");
update table set column = REPLACE(column,"-","");
update table set column = REPLACE(column,"_","");
update table set column = REPLACE(column,"=","");
update table set column = REPLACE(column,"+","");
update table set column = REPLACE(column,"{","");
update table set column = REPLACE(column,"}","");
update table set column = REPLACE(column,"[","");
update table set column = REPLACE(column,"]","");
update table set column = REPLACE(column,"|","");
update table set column = REPLACE(column,";","");
update table set column = REPLACE(column,":","");
update table set column = REPLACE(column,"'","");
update table set column = REPLACE(column,"<","");
update table set column = REPLACE(column,",","");
update table set column = REPLACE(column,">","");
update table set column = REPLACE(column,".","");
update table set column = REPLACE(column,"/","");
update table set column = REPLACE(column,"?","");
update table set column = REPLACE(column,"\","");
update table set column = REPLACE(column,"\"","");

回答by Adeel Raza Azeemi

This might be useful.

这可能有用。

This solution doesn't involves creating procedures or functions or lengthy use of replace within replace. Instead we know that all the ASCII characters that doesn't involves special character lies within ASCII codes \x20-\x7E (Hex representation). Source ASCII From Wikipedia, the free encyclopediaBelow are all those characters in that interval.

此解决方案不涉及创建过程或函数或在替换中长时间使用替换。相反,我们知道所有不涉及特殊字符的 ASCII 字符都位于 ASCII 代码 \x20-\x7E(十六进制表示)内。源 ASCII 来自维基百科,免费的百科全书以下是该区间内的所有字符。

Hex: 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E
Glyph: ?space ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ↑ ← @ a b c d e f g h i j k l m n o p q r s t u v w x y z { ACK } ESC

so as simple regular expression replace will do the job

所以简单的正则表达式替换就可以完成这项工作

SELECT REGEXP_REPLACE(columnName, '[^\x20-\x7E]', '') from tableName;

PHPCustom query string

PHP自定义查询字符串

$query = "select REGEXP_REPLACE(columnName, '(.*)[(].*[)](.*)', CONCAT('\\1', '\\2')) `Alias` FROM table_Name";

The above statement replaces the content in between brackets as well as brackets. e.g. if the column contains 'Staff Oreintation (CMST TOT)' then above statement will removes the brackets and its contant i.e. 'Staff Oreintation'.

上面的语句替换了括号之间的内容以及括号中的内容。例如,如果该列包含“员工配置(CMST TOT)”,则上述语句将删除括号及其内容,即“员工配置”。

PS: If you are doing any DML (select, update ...) operation using prepare statement in stored procedure OR through PHP (creating a custom query string); then remember to escape the slash i.e.

PS:如果您正在使用存储过程中的prepare语句或通过PHP(创建自定义查询字符串)执行任何DML(选择,更新...)操作;然后记得逃避斜线即

SET @sql = CONCAT("SELECT REGEXP_REPLACE(columnName, '[^\\x20-\\x7E]', '') from tableName");
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

The above SQL statement does a simple regular expression replaces (actually removes) of all the special character; i.e. In the SQL a REGEX pattern is mention of all the special characters to be replaced with nothing.

上面的SQL语句做了一个简单的正则表达式替换(实际上是去除)了所有的特殊字符;即在 SQL 中,REGEX 模式提到了所有要替换为空的特殊字符。

Explanation of the pattern

图案说明

A character group is started with the square bracket. The first character is caret which means; negation of all the characters mention in the group (i.e. with in the squares brackets). This simply means select compliment (other character than those selected) of all the characters in the group.

一个字符组以方括号开始。第一个字符是脱字符,意思是;否定组中提到的所有字符(即在方括号中)。这只是意味着选择组中所有字符的恭维(除所选字符之外的其他字符)。

Just to summarize the above statement will

只是总结一下上面的陈述将

Unchanged: all the alphanumeric characters, punctuation characters, arithmetic operators.

不变:所有字母数字字符、标点字符、算术运算符。

Removeall the Unicode characters (other than Latin alphabets) or special characters.

删除所有 Unicode 字符(拉丁字母除外)或特殊字符。

回答by Jijesh Cherrai

There is no regular expression replacement. Use the following code to replace all special characters with '-'.

没有正则表达式替换。使用以下代码将所有特殊字符替换为“-”。

UPDATE <table> SET <column> = REPLACE ( REPLACE ( REPLACE ( REPLACE ( REPLACE ( REPLACE ( REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (<column>, '/', '-'), ',', '-'), '.', '-'), '<', '-'), '>', '-'), '?', '-'), ';', '-'), ':', '-'), '"', '-'), "'", '-'), '|', '-'), '\', '-'), '=', '-'), '+', '-'), '*', '-'), '&', '-'), '^', '-'), '%', '-'), '$', '-'), '#', '-'), '@', '-'), '!', '-'), '~', '-'), '`', '-'), '', '-'), '{', '-' ), '}', '-' ), '[', '-' ), ']', '-' ), '(', '-' ), ')', '-' )

Code formatted

代码格式化

UPDATE
    <table>
SET
    <column> =
REPLACE
    (
    REPLACE
        (
        REPLACE
            (
            REPLACE
                (
                REPLACE
                    (
                    REPLACE
                        (
                        REPLACE
                            (
                            REPLACE
                                (
                                REPLACE
                                    (
                                    REPLACE
                                        (
                                        REPLACE
                                            (
                                            REPLACE
                                                (
                                                REPLACE
                                                    (
                                                    REPLACE
                                                        (
                                                        REPLACE
                                                            (
                                                            REPLACE
                                                                (
                                                                REPLACE
                                                                    (
                                                                    REPLACE
                                                                        (
                                                                        REPLACE
                                                                            (
                                                                            REPLACE
                                                                                (
                                                                                REPLACE
                                                                                    (
                                                                                    REPLACE
                                                                                        (
                                                                                        REPLACE
                                                                                            (
                                                                                            REPLACE
                                                                                                (
                                                                                                REPLACE
                                                                                                    (
                                                                                                    REPLACE
                                                                                                        (
                                                                                                        REPLACE
                                                                                                            (
                                                                                                            REPLACE
                                                                                                                (
                                                                                                                REPLACE
                                                                                                                    (
                                                                                                                    REPLACE
                                                                                                                        (
                                                                                                                    REPLACE
                                                                                                                        (<column>, '/', '-'),
                                                                                                                        ',',
                                                                                                                        '-'
                                                                                                                    ),
                                                                                                                    '.',
                                                                                                                    '-'
                                                                                                                ),
                                                                                                                '<',
                                                                                                                '-'
                                                                                                            ),
                                                                                                            '>',
                                                                                                            '-'
                                                                                                        ),
                                                                                                        '?',
                                                                                                        '-'
                                                                                                    ),
                                                                                                    ';',
                                                                                                    '-'
                                                                                                ),
                                                                                                ':',
                                                                                                '-'
                                                                                            ),
                                                                                            '"',
                                                                                            '-'
                                                                                        ),
                                                                                        "'",
                                                                                        '-'
                                                                                    ),
                                                                                    '|',
                                                                                    '-'
                                                                                ),
                                                                                '\',
                                                                                '-'
                                                                            ),
                                                                            '=',
                                                                            '-'
                                                                        ),
                                                                        '+',
                                                                        '-'
                                                                    ),
                                                                    '*',
                                                                    '-'
                                                                ),
                                                                '&',
                                                                '-'
                                                            ),
                                                            '^',
                                                            '-'
                                                        ),
                                                        '%',
                                                        '-'
                                                    ),
                                                    '$',
                                                    '-'
                                                ),
                                                '#',
                                                '-'
                                            ),
                                            '@',
                                            '-'
                                        ),
                                        '!',
                                        '-'
                                    ),
                                    '~',
                                    '-'
                                ),
                                '`',
                                '-'
                            ),
                            '',
                            '-'
                        ),
                        '{',
                        '-'
                    ),
                    '}',
                    '-'
                ),
                '[',
                '-'
            ),
            ']',
            '-'
        ),
        '(',
        '-'
    ),
    ')',
    '-'
)

回答by Rasika

Have a look at LIB_MYSQLUDF_PREGwhich needs to be compiled into the MySQL server, but has advanced regular expression facilities such as preg_replacewhich will help with your task.

看看LIB_MYSQLUDF_PREG,它需要编译到 MySQL 服务器中,但具有高级正则表达式工具,例如preg_replace可以帮助您完成任务。

回答by Jennifer Meyer

My version of MySQL doesn't have REGEXP_REPLACE(). I used the following two workarounds: 1. Remove specified characters (if you know what characters you want to remove)

我的 MySQL 版本没有 REGEXP_REPLACE()。我使用了以下两种解决方法: 1. 删除指定的字符(如果您知道要删除的字符)

    create function fn_remove_selected_characters
        (v_input_string varchar(255),
         v_unacceptable_characters varchar(255))
    RETURNS varchar(255)
    BEGIN

    -- declare variables
    declare i int;
    declare unacceptable_values varchar(255);
    declare this_character char(1);
    declare output_string varchar(255);
    declare input_length int;
    declare boolean_value int;
    declare space varchar(3);

    -- Set variable values
    set input_length = char_length(v_input_string);
    set i = 0;
    set unacceptable_values = v_unacceptable_characters;
    set output_string = '';
    set boolean_value = 0;
    set space = 'no';

    begin
    -- Leave spaces if they aren't in the exclude list
    if instr( unacceptable_values, ' ') = 0 then
        begin
        while i < input_length do
            SET this_character = SUBSTRING( v_input_string, i, 1 );
                -- If the current character is a space, 
                -- then concatenate a space to the output
                -- Although it seems redundant to explicitly add a space,
                -- SUBSTRING() equates a space to the empty string
                if this_character = ' ' then
                    set output_string = concat(output_string, ' ');
                -- if the current character is not a space, remove it if it's unwanted
                elseif instr(unacceptable_values, this_character) then
                    set output_string = concat(output_string, '');
                -- otherwise include the character
                else set output_string = concat(output_string, this_character);
                end if;
            set i = i + 1;
        end while;
        end;
    else
        begin
        while i < input_length do
            begin
            SET this_character = SUBSTRING( v_input_string, i, 1 );
            if instr(unacceptable_values, this_character) > 0 then
                set output_string = concat(output_string, '');
            else set output_string = concat(output_string, this_character);
            end if;
            end;
            set i = i + 1;
        end while;
        end;
    end if;
    end;
        RETURN output_string;
  1. Keep only the characters you want:
  1. 只保留你想要的字符:
    create function fn_preserve_selected_characters
        (v_input_string varchar(255),
         v_acceptable_characters varchar(255))
    returns varchar(255)

    begin
    declare i int;
    declare acceptable_values varchar(255);
    declare this_character char(1);
    declare output_string varchar(255);
    declare input_length int;
    declare boolean_value int;
    declare space varchar(3);

    set input_length = char_length(v_input_string);
    set i = 0;
    set acceptable_values = v_acceptable_characters;
    set output_string = '';
    set boolean_value = 0;
    set space = 'no';

    begin

    -- check for existence of spaces
    if instr( acceptable_values, ' ') then
        begin
        while i < input_length do
            -- SUBSTRING() treats spaces as empty strings
            -- so handle them specially
            SET this_character = SUBSTRING( v_input_string, i, 1 );
                if this_character = ' ' then
                    set output_string = concat(output_string, ' ');
                elseif instr(acceptable_values, this_character) then
                    set output_string = concat(output_string, this_character);
                else set output_string = concat(output_string, '');
                end if;
            set i = i + 1;
        end while;
        end;
    -- if there are no spaces in input string
    -- then this section is complete
    else 
        begin
        while i <= input_length do
            SET this_character = SUBSTRING( v_input_string, i, 1 );
            -- if the current character exists in the punctuation string
            if LOCATE( this_character, acceptable_values ) > 0 THEN
                set output_string = concat(output_string, this_character);
            end if;
            set i = i+1;
        end while;
        end;
    end if;
    end;
        RETURN output_string;

回答by Saj

Adeel's answer is by far the best and simplest.

Adeel 的答案是迄今为止最好和最简单的。

The OP needed to update the db, which is what I need too. So I figured I'd put that here for the next poor sole, like me, not to have to redo what I did.

OP 需要更新数据库,这也是我所需要的。所以我想我会把它放在这里作为下一个可怜的鞋底,就像我一样,不必重做我所做的。

Double check first, select it and scan them to make sure you're getting the right rows, before you update.

在更新之前,首先仔细检查,选择它并扫描它们以确保您获得正确的行。

SELECT REGEXP_REPLACE(columnName, '[^\x20-\x7E]', '') from tableName;

Countto do a safety check ...

计数做一个安全检查......

SELECT count(*) from tableName WHERE columnName REGEXP '[^\x20-\x7E]';

For some names I had to do another mapping so as not to lose their meaning like Ramon to Ramn because the o has a umlaut or grave or circumflex. So I used this to map ... https://theasciicode.com.ar

对于某些名称,我不得不进行另一次映射,以免因为 o 带有变音符号、坟墓或抑扬符号而失去它们的含义,例如从 Ramon 到 Ramn。所以我用它来映射... https://theasciicode.com.ar

Then updateThis update is a catch all after the mapping update. Change the limit number to the count value above ...

然后更新此更新是映射更新之后的全部内容。将限制数更改为上面的计数值...

UPDATE tablename SET columnName = REGEXP_REPLACE(columnName, '[^\x20-\x7E]', '') WHERE columnName REGEXP '[^\x20-\x7E]' LIMIT 1;