MySQL 从数据库字段中删除特殊字符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/6066953/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove special characters from a database field
提问by MarathonStudios
I have a database with several thousand records, and I need to strip down one of the fields to ensure that it only contains certain characters (Alphanumeric, spaces, and single quotes). What SQL can I use to strip any other characters (such as slashes, etc) from that field in the whole database?
我有一个包含数千条记录的数据库,我需要剥离其中一个字段以确保它只包含某些字符(字母数字、空格和单引号)。我可以使用什么 SQL 从整个数据库中的该字段中删除任何其他字符(例如斜杠等)?
回答by Vinnie
update mytable
set FieldName = REPLACE(FieldName,'/','')
That's a good place to start.
这是一个很好的起点。
回答by DonaldSowell
The Replace() function is first choice. However, Special Characters can sometimes be tricky to write in a console. For those you can combine Replace with the Char() function.
Replace() 函数是首选。但是,在控制台中编写特殊字符有时会很棘手。对于那些,您可以将 Replace 与 Char() 函数结合使用。
e.g. removing
例如移除
Update products set description = replace(description, char(128), '');
You can find all the Ascii values here
Ideally you could do a regex to find all the special chars, but apparently that's not possible with MySQL.
理想情况下,您可以使用正则表达式来查找所有特殊字符,但显然MySQL 不可能做到这一点。
Beyond that, you'd need to run it through your favorite scripting language.
除此之外,您还需要通过您最喜欢的脚本语言来运行它。
回答by Juned Ansari
I have created simple function for this
我为此创建了简单的函数
DROP FUNCTION IF EXISTS `regex_replace`$$
CREATE FUNCTION `regex_replace`(pattern VARCHAR(1000),replacement VARCHAR(1000),original VARCHAR(1000)) RETURNS VARCHAR(1000) CHARSET utf8mb4
DETERMINISTIC
BEGIN
DECLARE temp VARCHAR(1000);
DECLARE ch VARCHAR(1);
DECLARE i INT;
SET i = 1;
SET temp = '';
IF original REGEXP pattern THEN
loop_label: LOOP
IF i>CHAR_LENGTH(original) THEN
LEAVE loop_label;
END IF;
SET ch = SUBSTRING(original,i,1);
IF NOT ch REGEXP pattern THEN
SET temp = CONCAT(temp,ch);
ELSE
SET temp = CONCAT(temp,replacement);
END IF;
SET i=i+1;
END LOOP;
ELSE
SET temp = original;
END IF;
RETURN temp;
END
Usage example:
用法示例:
SELECT <field-name> AS NormalText, regex_replace('[^A-Za-z0-9 ]', '', <field-name>)AS RegexText FROM
<table-name>
回答by Adams Biyi
This may also be useful.
这也可能有用。
First you have to know the character set of the database and / or of the table. For example, let us suppose you have a UTF-8 environment and you want to strip / remove symbols like circled registered symbols, circled copyright symbol, and registered trademark symbol from a field then search the internet via bing or yahoo or google for for the hex code values of these symbols in the UTF-8 system:
首先,您必须知道数据库和/或表的字符集。例如,让我们假设您有一个 UTF-8 环境,并且您想从字段中剥离/删除符号,如带圆圈的注册符号、带圆圈的版权符号和注册商标符号,然后通过 bing 或 yahoo 或 google 在互联网上搜索UTF-8 系统中这些符号的十六进制代码值:
Symbol Utf-8 Hex ======= ========= circled copyright C2A9 circled registered C2AE Trademark (i.e., TM) E284A2
Then your scrubbing select sql for field f1 from table t1, using the hex / unhex facility in conjunction with the replace function, will most likely look like this:
然后,使用十六进制/非十六进制工具和替换函数,从表 t1 中为字段 f1 擦洗选择 sql,很可能如下所示:
SELECT
cast(unhex(replace(replace(replace(hex(f1),'C2A9',''),'C2AE',''),'E284A2','')) AS char) AS cleanf1
FROM t1
;
SELECT
cast(unhex(replace(replace(replace(hex(f1),'C2A9',''),'C2AE',''),'E284A2','')) AS char) AS cleanf1
FROM t1
;
Above, note the original field to be scrubbed / cleansed is f1, the table is t1 and the output header is cleanf1. The "as char" casting is necessary because, w/o it, the mysql 5.5.8 on which I tested is returning blob. Hope this helps
上面,注意要清理/清理的原始字段是 f1,表是 t1,输出标题是 cleanf1。“as char”转换是必要的,因为没有它,我测试的 mysql 5.5.8 正在返回 blob。希望这可以帮助
回答by Adams Biyi
Elaborating on Vinnies answer... you can use the following (note the escaping in the last two statements...
详细说明 Vinnies 的答案......你可以使用以下内容(注意最后两个语句中的转义......
update table set column = REPLACE(column,"`","");
update table set column = REPLACE(column,"~","");
update table set column = REPLACE(column,"!","");
update table set column = REPLACE(column,"@","");
update table set column = REPLACE(column,"#","");
update table set column = REPLACE(column,"$","");
update table set column = REPLACE(column,"%","");
update table set column = REPLACE(column,"^","");
update table set column = REPLACE(column,"&","");
update table set column = REPLACE(column,"*","");
update table set column = REPLACE(column,"(","");
update table set column = REPLACE(column,")","");
update table set column = REPLACE(column,"-","");
update table set column = REPLACE(column,"_","");
update table set column = REPLACE(column,"=","");
update table set column = REPLACE(column,"+","");
update table set column = REPLACE(column,"{","");
update table set column = REPLACE(column,"}","");
update table set column = REPLACE(column,"[","");
update table set column = REPLACE(column,"]","");
update table set column = REPLACE(column,"|","");
update table set column = REPLACE(column,";","");
update table set column = REPLACE(column,":","");
update table set column = REPLACE(column,"'","");
update table set column = REPLACE(column,"<","");
update table set column = REPLACE(column,",","");
update table set column = REPLACE(column,">","");
update table set column = REPLACE(column,".","");
update table set column = REPLACE(column,"/","");
update table set column = REPLACE(column,"?","");
update table set column = REPLACE(column,"\","");
update table set column = REPLACE(column,"\"","");
回答by Adeel Raza Azeemi
This might be useful.
这可能有用。
This solution doesn't involves creating procedures or functions or lengthy use of replace within replace. Instead we know that all the ASCII characters that doesn't involves special character lies within ASCII codes \x20-\x7E (Hex representation). Source ASCII From Wikipedia, the free encyclopediaBelow are all those characters in that interval.
此解决方案不涉及创建过程或函数或在替换中长时间使用替换。相反,我们知道所有不涉及特殊字符的 ASCII 字符都位于 ASCII 代码 \x20-\x7E(十六进制表示)内。源 ASCII 来自维基百科,免费的百科全书以下是该区间内的所有字符。
Hex: 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E
Glyph: ?space ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ↑ ← @ a b c d e f g h i j k l m n o p q r s t u v w x y z { ACK } ESC
so as simple regular expression replace will do the job
所以简单的正则表达式替换就可以完成这项工作
SELECT REGEXP_REPLACE(columnName, '[^\x20-\x7E]', '') from tableName;
PHPCustom query string
PHP自定义查询字符串
$query = "select REGEXP_REPLACE(columnName, '(.*)[(].*[)](.*)', CONCAT('\\1', '\\2')) `Alias` FROM table_Name";
The above statement replaces the content in between brackets as well as brackets. e.g. if the column contains 'Staff Oreintation (CMST TOT)' then above statement will removes the brackets and its contant i.e. 'Staff Oreintation'.
上面的语句替换了括号之间的内容以及括号中的内容。例如,如果该列包含“员工配置(CMST TOT)”,则上述语句将删除括号及其内容,即“员工配置”。
PS: If you are doing any DML (select, update ...) operation using prepare statement in stored procedure OR through PHP (creating a custom query string); then remember to escape the slash i.e.
PS:如果您正在使用存储过程中的prepare语句或通过PHP(创建自定义查询字符串)执行任何DML(选择,更新...)操作;然后记得逃避斜线即
SET @sql = CONCAT("SELECT REGEXP_REPLACE(columnName, '[^\\x20-\\x7E]', '') from tableName");
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
The above SQL statement does a simple regular expression replaces (actually removes) of all the special character; i.e. In the SQL a REGEX pattern is mention of all the special characters to be replaced with nothing.
上面的SQL语句做了一个简单的正则表达式替换(实际上是去除)了所有的特殊字符;即在 SQL 中,REGEX 模式提到了所有要替换为空的特殊字符。
Explanation of the pattern
图案说明
A character group is started with the square bracket. The first character is caret which means; negation of all the characters mention in the group (i.e. with in the squares brackets). This simply means select compliment (other character than those selected) of all the characters in the group.
一个字符组以方括号开始。第一个字符是脱字符,意思是;否定组中提到的所有字符(即在方括号中)。这只是意味着选择组中所有字符的恭维(除所选字符之外的其他字符)。
Just to summarize the above statement will
只是总结一下上面的陈述将
Unchanged: all the alphanumeric characters, punctuation characters, arithmetic operators.
不变:所有字母数字字符、标点字符、算术运算符。
Removeall the Unicode characters (other than Latin alphabets) or special characters.
删除所有 Unicode 字符(拉丁字母除外)或特殊字符。
回答by Jijesh Cherrai
There is no regular expression replacement. Use the following code to replace all special characters with '-'.
没有正则表达式替换。使用以下代码将所有特殊字符替换为“-”。
UPDATE <table> SET <column> = REPLACE ( REPLACE ( REPLACE ( REPLACE ( REPLACE ( REPLACE ( REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (REPLACE (<column>, '/', '-'), ',', '-'), '.', '-'), '<', '-'), '>', '-'), '?', '-'), ';', '-'), ':', '-'), '"', '-'), "'", '-'), '|', '-'), '\', '-'), '=', '-'), '+', '-'), '*', '-'), '&', '-'), '^', '-'), '%', '-'), '$', '-'), '#', '-'), '@', '-'), '!', '-'), '~', '-'), '`', '-'), '', '-'), '{', '-' ), '}', '-' ), '[', '-' ), ']', '-' ), '(', '-' ), ')', '-' )
Code formatted
代码格式化
UPDATE
<table>
SET
<column> =
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(
REPLACE
(<column>, '/', '-'),
',',
'-'
),
'.',
'-'
),
'<',
'-'
),
'>',
'-'
),
'?',
'-'
),
';',
'-'
),
':',
'-'
),
'"',
'-'
),
"'",
'-'
),
'|',
'-'
),
'\',
'-'
),
'=',
'-'
),
'+',
'-'
),
'*',
'-'
),
'&',
'-'
),
'^',
'-'
),
'%',
'-'
),
'$',
'-'
),
'#',
'-'
),
'@',
'-'
),
'!',
'-'
),
'~',
'-'
),
'`',
'-'
),
'',
'-'
),
'{',
'-'
),
'}',
'-'
),
'[',
'-'
),
']',
'-'
),
'(',
'-'
),
')',
'-'
)
回答by Rasika
Have a look at LIB_MYSQLUDF_PREGwhich needs to be compiled into the MySQL server, but has advanced regular expression facilities such as preg_replacewhich will help with your task.
看看LIB_MYSQLUDF_PREG,它需要编译到 MySQL 服务器中,但具有高级正则表达式工具,例如preg_replace可以帮助您完成任务。
回答by Jennifer Meyer
My version of MySQL doesn't have REGEXP_REPLACE(). I used the following two workarounds: 1. Remove specified characters (if you know what characters you want to remove)
我的 MySQL 版本没有 REGEXP_REPLACE()。我使用了以下两种解决方法: 1. 删除指定的字符(如果您知道要删除的字符)
create function fn_remove_selected_characters
(v_input_string varchar(255),
v_unacceptable_characters varchar(255))
RETURNS varchar(255)
BEGIN
-- declare variables
declare i int;
declare unacceptable_values varchar(255);
declare this_character char(1);
declare output_string varchar(255);
declare input_length int;
declare boolean_value int;
declare space varchar(3);
-- Set variable values
set input_length = char_length(v_input_string);
set i = 0;
set unacceptable_values = v_unacceptable_characters;
set output_string = '';
set boolean_value = 0;
set space = 'no';
begin
-- Leave spaces if they aren't in the exclude list
if instr( unacceptable_values, ' ') = 0 then
begin
while i < input_length do
SET this_character = SUBSTRING( v_input_string, i, 1 );
-- If the current character is a space,
-- then concatenate a space to the output
-- Although it seems redundant to explicitly add a space,
-- SUBSTRING() equates a space to the empty string
if this_character = ' ' then
set output_string = concat(output_string, ' ');
-- if the current character is not a space, remove it if it's unwanted
elseif instr(unacceptable_values, this_character) then
set output_string = concat(output_string, '');
-- otherwise include the character
else set output_string = concat(output_string, this_character);
end if;
set i = i + 1;
end while;
end;
else
begin
while i < input_length do
begin
SET this_character = SUBSTRING( v_input_string, i, 1 );
if instr(unacceptable_values, this_character) > 0 then
set output_string = concat(output_string, '');
else set output_string = concat(output_string, this_character);
end if;
end;
set i = i + 1;
end while;
end;
end if;
end;
RETURN output_string;
- Keep only the characters you want:
- 只保留你想要的字符:
create function fn_preserve_selected_characters
(v_input_string varchar(255),
v_acceptable_characters varchar(255))
returns varchar(255)
begin
declare i int;
declare acceptable_values varchar(255);
declare this_character char(1);
declare output_string varchar(255);
declare input_length int;
declare boolean_value int;
declare space varchar(3);
set input_length = char_length(v_input_string);
set i = 0;
set acceptable_values = v_acceptable_characters;
set output_string = '';
set boolean_value = 0;
set space = 'no';
begin
-- check for existence of spaces
if instr( acceptable_values, ' ') then
begin
while i < input_length do
-- SUBSTRING() treats spaces as empty strings
-- so handle them specially
SET this_character = SUBSTRING( v_input_string, i, 1 );
if this_character = ' ' then
set output_string = concat(output_string, ' ');
elseif instr(acceptable_values, this_character) then
set output_string = concat(output_string, this_character);
else set output_string = concat(output_string, '');
end if;
set i = i + 1;
end while;
end;
-- if there are no spaces in input string
-- then this section is complete
else
begin
while i <= input_length do
SET this_character = SUBSTRING( v_input_string, i, 1 );
-- if the current character exists in the punctuation string
if LOCATE( this_character, acceptable_values ) > 0 THEN
set output_string = concat(output_string, this_character);
end if;
set i = i+1;
end while;
end;
end if;
end;
RETURN output_string;
回答by Saj
Adeel's answer is by far the best and simplest.
Adeel 的答案是迄今为止最好和最简单的。
The OP needed to update the db, which is what I need too. So I figured I'd put that here for the next poor sole, like me, not to have to redo what I did.
OP 需要更新数据库,这也是我所需要的。所以我想我会把它放在这里作为下一个可怜的鞋底,就像我一样,不必重做我所做的。
Double check first, select it and scan them to make sure you're getting the right rows, before you update.
在更新之前,首先仔细检查,选择它并扫描它们以确保您获得正确的行。
SELECT REGEXP_REPLACE(columnName, '[^\x20-\x7E]', '') from tableName;
Countto do a safety check ...
计数做一个安全检查......
SELECT count(*) from tableName WHERE columnName REGEXP '[^\x20-\x7E]';
For some names I had to do another mapping so as not to lose their meaning like Ramon to Ramn because the o has a umlaut or grave or circumflex. So I used this to map ... https://theasciicode.com.ar
对于某些名称,我不得不进行另一次映射,以免因为 o 带有变音符号、坟墓或抑扬符号而失去它们的含义,例如从 Ramon 到 Ramn。所以我用它来映射... https://theasciicode.com.ar
Then updateThis update is a catch all after the mapping update. Change the limit number to the count value above ...
然后更新此更新是映射更新之后的全部内容。将限制数更改为上面的计数值...
UPDATE tablename SET columnName = REGEXP_REPLACE(columnName, '[^\x20-\x7E]', '') WHERE columnName REGEXP '[^\x20-\x7E]' LIMIT 1;

