如何在 MySQL 中进行正则表达式替换?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/986826/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to do a regular expression replace in MySQL?
提问by Piskvor left the building
I have a table with ~500k rows; varchar(255) UTF8 column filename
contains a file name;
我有一个约 50 万行的表;varchar(255) UTF8 列filename
包含文件名;
I'm trying to strip out various strange characters out of the filename - thought I'd use a character class: [^a-zA-Z0-9()_ .\-]
我试图从文件名中去除各种奇怪的字符 - 以为我会使用字符类: [^a-zA-Z0-9()_ .\-]
Now, is there a function in MySQL that lets you replace through a regular expression? I'm looking for a similar functionality to REPLACE() function - simplified example follows:
现在,MySQL 中是否有一个函数可以让您通过正则表达式进行替换?我正在寻找与 REPLACE() 函数类似的功能 - 简化示例如下:
SELECT REPLACE('stackowerflow', 'ower', 'over');
Output: "stackoverflow"
/* does something like this exist? */
SELECT X_REG_REPLACE('Stackoverflow','/[A-Zf]/','-');
Output: "-tackover-low"
I know about REGEXP/RLIKE, but those only check ifthere is a match, not whatthe match is.
我知道REGEXP/RLIKE,但那些只检查是否有匹配,而不是匹配是什么。
(I coulddo a "SELECT pkey_id,filename FROM foo WHERE filename RLIKE '[^a-zA-Z0-9()_ .\-]'
" from a PHP script, do a preg_replace
and then "UPDATE foo ... WHERE pkey_id=...
", but that looks like a last-resort slow & ugly hack)
(我可以做一个“ SELECT pkey_id,filename FROM foo WHERE filename RLIKE '[^a-zA-Z0-9()_ .\-]'
”从PHP脚本,做了preg_replace
,然后“ UPDATE foo ... WHERE pkey_id=...
”,但看起来像一个不得已的缓慢和丑陋的黑客)
采纳答案by Lukasz Szozda
With MySQL 8.0+you could use natively REGEXP_REPLACE
function.
使用MySQL 8.0+,您可以使用本机REGEXP_REPLACE
功能。
REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])
Replaces occurrences in the string exprthat match the regular expression specified by the pattern patwith the replacement string repl, and returns the resulting string. If expr, pat, or replis
NULL
, the return value isNULL
.
REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])
用替换字符串repl替换字符串expr中与模式pat指定的正则表达式匹配的匹配项,并返回结果字符串。如果expr、pat或repl是,则返回值是。
NULL
NULL
and Regular expression support:
和正则表达式支持:
Previously, MySQLused the Henry Spencer regular expression library to support regular expression operators (
REGEXP
,RLIKE
).Regular expression support has been reimplemented using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe. The
REGEXP_LIKE()
function performs regular expression matching in the manner of theREGEXP
andRLIKE
operators, which now are synonyms for that function. In addition, theREGEXP_INSTR()
,REGEXP_REPLACE()
, andREGEXP_SUBSTR()
functions are available to find match positions and perform substring substitution and extraction, respectively.
以前,MySQL使用 Henry Spencer 正则表达式库来支持正则表达式运算符 (
REGEXP
,RLIKE
)。使用 International Components for Unicode (ICU) 重新实现了正则表达式支持,该组件提供完整的 Unicode 支持并且是多字节安全的。该
REGEXP_LIKE()
函数以REGEXP
andRLIKE
运算符的方式执行正则表达式匹配,现在它们是该函数的同义词。此外,REGEXP_INSTR()
,REGEXP_REPLACE()
,和REGEXP_SUBSTR()
功能可用于找到匹配的位置,并执行串分别取代和提取。
SELECT REGEXP_REPLACE('Stackoverflow','[A-Zf]','-',1,0,'c');
-- Output:
-tackover-low
回答by Jeremy Stein
MySQL 8.0+:
MySQL 8.0+:
You can use the native REGEXP_REPLACE
function.
您可以使用本机REGEXP_REPLACE
功能。
Older versions:
旧版本:
You can use a user-defined function (UDF) like mysql-udf-regexp.
您可以使用像mysql-udf-regexp这样的用户定义函数 ( UDF) 。
回答by Benvorth
Use MariaDB instead. It has a function
请改用 MariaDB。它有一个功能
REGEXP_REPLACE(col, regexp, replace)
See MariaDB docsand PCRE Regular expression enhancements
Note that you can use regexp grouping as well (I found that very useful):
请注意,您也可以使用正则表达式分组(我发现这非常有用):
SELECT REGEXP_REPLACE("stackoverflow", "(stack)(over)(flow)", '\2 - \1 - \3')
returns
返回
over - stack - flow
回答by Ryan Ward
My brute force method to get this to work was just:
我让这个工作的蛮力方法只是:
- Dump the table -
mysqldump -u user -p database table > dump.sql
- Find and replace a couple patterns -
find /path/to/dump.sql -type f -exec sed -i 's/old_string/new_string/g' {} \;
, There are obviously other perl regeular expressions you could perform on the file as well. - Import the table -
mysqlimport -u user -p database table < dump.sql
- 倾倒桌子——
mysqldump -u user -p database table > dump.sql
- 查找并替换几个模式 -
find /path/to/dump.sql -type f -exec sed -i 's/old_string/new_string/g' {} \;
,显然您也可以对文件执行其他 perl 正则表达式。 - 导入表 -
mysqlimport -u user -p database table < dump.sql
If you want to make sure the string isn't elsewhere in your dataset, run a few regular expressions to make sure they all occur in a similar environment. It's also not that tough to create a backup before you run a replace, in case you accidentally destroy something that loses depth of information.
如果要确保字符串不在数据集中的其他位置,请运行一些正则表达式以确保它们都出现在类似的环境中。在运行替换之前创建备份也不是那么困难,以防您不小心破坏了丢失信息深度的东西。
回答by rasika godawatte
I recently wrote a MySQL function to replace strings using regular expressions. You could find my post at the following location:
我最近编写了一个 MySQL 函数来使用正则表达式替换字符串。您可以在以下位置找到我的帖子:
http://techras.wordpress.com/2011/06/02/regex-replace-for-mysql/
http://techras.wordpress.com/2011/06/02/regex-replace-for-mysql/
Here is the function code:
下面是函数代码:
DELIMITER $$
CREATE FUNCTION `regex_replace`(pattern VARCHAR(1000),replacement VARCHAR(1000),original VARCHAR(1000))
RETURNS VARCHAR(1000)
DETERMINISTIC
BEGIN
DECLARE temp VARCHAR(1000);
DECLARE ch VARCHAR(1);
DECLARE i INT;
SET i = 1;
SET temp = '';
IF original REGEXP pattern THEN
loop_label: LOOP
IF i>CHAR_LENGTH(original) THEN
LEAVE loop_label;
END IF;
SET ch = SUBSTRING(original,i,1);
IF NOT ch REGEXP pattern THEN
SET temp = CONCAT(temp,ch);
ELSE
SET temp = CONCAT(temp,replacement);
END IF;
SET i=i+1;
END LOOP;
ELSE
SET temp = original;
END IF;
RETURN temp;
END$$
DELIMITER ;
Example execution:
示例执行:
mysql> select regex_replace('[^a-zA-Z0-9\-]','','2my test3_text-to. check \ my- sql (regular) ,expressions ._,');
回答by Jay Patel
we solve this problem without using regex this query replace only exact match string.
我们在不使用正则表达式的情况下解决了这个问题,这个查询只替换了完全匹配的字符串。
update employee set
employee_firstname =
trim(REPLACE(concat(" ",employee_firstname," "),' jay ',' abc '))
Example:
例子:
emp_id employee_firstname
1 jay
2 jay ajay
3 jay
emp_id 员工名字
1 杰
2 杰杰
3 杰
After executing query result:
执行查询结果后:
emp_id employee_firstname
1 abc
2 abc ajay
3 abc
emp_id 员工名字
1 ABC
2 abc 杰伊
3 ABC
回答by dotancohen
I'm happy to report that since this question was asked, now there is a satisfactory answer! Take a look at this terrific package:
我很高兴地报告,既然有人问了这个问题,现在有一个满意的答案!看看这个很棒的包:
https://github.com/mysqludf/lib_mysqludf_preg
https://github.com/mysqludf/lib_mysqludf_preg
Sample SQL:
示例 SQL:
SELECT PREG_REPLACE('/(.*?)(fox)/' , 'dog' , 'the quick brown fox' ) AS demo;
I found the package from this blog postas linked on this question.
回答by Steve Chambers
UPDATE 2:A useful set of regex functions including REGEXP_REPLACEhave now been provided in MySQL 8.0. This renders reading on unnecessary unless you're constrained to using an earlier version.
更新 2:MySQL 8.0 中现在提供了一组有用的正则表达式函数,包括REGEXP_REPLACE。这使得阅读变得不必要,除非您被限制使用早期版本。
UPDATE 1:Have now made this into a blog post: http://stevettt.blogspot.co.uk/2018/02/a-mysql-regular-expression-replace.html
更新 1:现在已将其制成博客文章:http: //stevettt.blogspot.co.uk/2018/02/a-mysql-regular-expression-replace.html
The following expands upon the function provided by Rasika Godawattebut trawls through all necessary substrings rather than just testing single characters:
以下扩展了 Rasika Godawatte 提供的功能,但会遍历所有必要的子字符串,而不仅仅是测试单个字符:
-- ------------------------------------------------------------------------------------
-- USAGE
-- ------------------------------------------------------------------------------------
-- SELECT reg_replace(<subject>,
-- <pattern>,
-- <replacement>,
-- <greedy>,
-- <minMatchLen>,
-- <maxMatchLen>);
-- where:
-- <subject> is the string to look in for doing the replacements
-- <pattern> is the regular expression to match against
-- <replacement> is the replacement string
-- <greedy> is TRUE for greedy matching or FALSE for non-greedy matching
-- <minMatchLen> specifies the minimum match length
-- <maxMatchLen> specifies the maximum match length
-- (minMatchLen and maxMatchLen are used to improve efficiency but are
-- optional and can be set to 0 or NULL if not known/required)
-- Example:
-- SELECT reg_replace(txt, '^[Tt][^ ]* ', 'a', TRUE, 2, 0) FROM tbl;
DROP FUNCTION IF EXISTS reg_replace;
DELIMITER //
CREATE FUNCTION reg_replace(subject VARCHAR(21845), pattern VARCHAR(21845),
replacement VARCHAR(21845), greedy BOOLEAN, minMatchLen INT, maxMatchLen INT)
RETURNS VARCHAR(21845) DETERMINISTIC BEGIN
DECLARE result, subStr, usePattern VARCHAR(21845);
DECLARE startPos, prevStartPos, startInc, len, lenInc INT;
IF subject REGEXP pattern THEN
SET result = '';
-- Sanitize input parameter values
SET minMatchLen = IF(minMatchLen < 1, 1, minMatchLen);
SET maxMatchLen = IF(maxMatchLen < 1 OR maxMatchLen > CHAR_LENGTH(subject),
CHAR_LENGTH(subject), maxMatchLen);
-- Set the pattern to use to match an entire string rather than part of a string
SET usePattern = IF (LEFT(pattern, 1) = '^', pattern, CONCAT('^', pattern));
SET usePattern = IF (RIGHT(pattern, 1) = '$', usePattern, CONCAT(usePattern, '$'));
-- Set start position to 1 if pattern starts with ^ or doesn't end with $.
IF LEFT(pattern, 1) = '^' OR RIGHT(pattern, 1) <> '$' THEN
SET startPos = 1, startInc = 1;
-- Otherwise (i.e. pattern ends with $ but doesn't start with ^): Set start pos
-- to the min or max match length from the end (depending on "greedy" flag).
ELSEIF greedy THEN
SET startPos = CHAR_LENGTH(subject) - maxMatchLen + 1, startInc = 1;
ELSE
SET startPos = CHAR_LENGTH(subject) - minMatchLen + 1, startInc = -1;
END IF;
WHILE startPos >= 1 AND startPos <= CHAR_LENGTH(subject)
AND startPos + minMatchLen - 1 <= CHAR_LENGTH(subject)
AND !(LEFT(pattern, 1) = '^' AND startPos <> 1)
AND !(RIGHT(pattern, 1) = '$'
AND startPos + maxMatchLen - 1 < CHAR_LENGTH(subject)) DO
-- Set start length to maximum if matching greedily or pattern ends with $.
-- Otherwise set starting length to the minimum match length.
IF greedy OR RIGHT(pattern, 1) = '$' THEN
SET len = LEAST(CHAR_LENGTH(subject) - startPos + 1, maxMatchLen), lenInc = -1;
ELSE
SET len = minMatchLen, lenInc = 1;
END IF;
SET prevStartPos = startPos;
lenLoop: WHILE len >= 1 AND len <= maxMatchLen
AND startPos + len - 1 <= CHAR_LENGTH(subject)
AND !(RIGHT(pattern, 1) = '$'
AND startPos + len - 1 <> CHAR_LENGTH(subject)) DO
SET subStr = SUBSTRING(subject, startPos, len);
IF subStr REGEXP usePattern THEN
SET result = IF(startInc = 1,
CONCAT(result, replacement), CONCAT(replacement, result));
SET startPos = startPos + startInc * len;
LEAVE lenLoop;
END IF;
SET len = len + lenInc;
END WHILE;
IF (startPos = prevStartPos) THEN
SET result = IF(startInc = 1, CONCAT(result, SUBSTRING(subject, startPos, 1)),
CONCAT(SUBSTRING(subject, startPos, 1), result));
SET startPos = startPos + startInc;
END IF;
END WHILE;
IF startInc = 1 AND startPos <= CHAR_LENGTH(subject) THEN
SET result = CONCAT(result, RIGHT(subject, CHAR_LENGTH(subject) + 1 - startPos));
ELSEIF startInc = -1 AND startPos >= 1 THEN
SET result = CONCAT(LEFT(subject, startPos), result);
END IF;
ELSE
SET result = subject;
END IF;
RETURN result;
END//
DELIMITER ;
Demo
演示
Limitations
限制
- This method is of course going to take a while when the subject string is large. Update:Have now added minimum and maximum match length parameters for improved efficiency when these are known (zero = unknown/unlimited).
- It won'tallow substitution of backreferences (e.g.
\1
,\2
etc.) to replace capturing groups. If this functionality is needed, please see this answerwhich attempts to provide a workaround by updating the function to allow a secondary find and replace within each found match (at the expense of increased complexity). - If
^
and/or$
is used in the pattern, they must be at the very start and very end respectively - e.g. patterns such as(^start|end$)
are not supported. - There is a "greedy" flag to specify whether the overall matching should be greedy or non-greedy. Combining greedy and lazy matching within a single regular expression (e.g.
a.*?b.*
) is not supported.
- 当主题字符串很大时,此方法当然需要一段时间。更新:现在添加了最小和最大匹配长度参数,以在已知(零 = 未知/无限制)时提高效率。
- 它不会允许反向引用(例如取代
\1
,\2
等等)来替换捕获组。如果需要此功能,请参阅此答案,该答案试图通过更新函数以允许在每个找到的匹配项中进行辅助查找和替换(以增加复杂性为代价)来提供解决方法。 - 如果
^
和/或$
在模式中使用,它们必须分别位于最开始和最末尾——例如(^start|end$)
不支持的模式。 - 有一个“贪婪”标志来指定整体匹配是贪婪还是非贪婪。
a.*?b.*
不支持在单个正则表达式(例如)中组合贪婪和惰性匹配。
Usage Examples
使用示例
The function has been used to answer the following StackOverflow questions:
该函数已用于回答以下 StackOverflow 问题:
- How to count words in MySQL / regular expression replacer?
- How to extract the nth word and count word occurrences in a MySQL string?
- How to extract two consecutive digits from a text field in MySQL?
- How to remove all non-alpha numeric characters from a string in MySQL?
- How to replace every other instance of a particular character in a MySQL string?
- How to get all distinct words of a specified minimum length from multiple columns in a MySQL table?
回答by Eddie B
You 'can' do it ... but it's not very wise ... this is about as daring as I'll try ... as far as full RegEx support your much better off using perl or the like.
你'可以'做到......但这不是很明智......这与我尝试的一样大胆......只要完整的RegEx支持你使用perl或类似的东西会更好。
UPDATE db.tbl
SET column =
CASE
WHEN column REGEXP '[[:<:]]WORD_TO_REPLACE[[:>:]]'
THEN REPLACE(column,'WORD_TO_REPLACE','REPLACEMENT')
END
WHERE column REGEXP '[[:<:]]WORD_TO_REPLACE[[:>:]]'
回答by user3796869
We can use IF condition in SELECT query as below:
我们可以在 SELECT 查询中使用 IF 条件,如下所示:
Suppose that for anything with "ABC","ABC1","ABC2","ABC3",..., we want to replace with "ABC" then using REGEXP and IF() condition in the SELECT query, we can achieve this.
假设对于任何带有“ABC”、“ABC1”、“ABC2”、“ABC3”、...的东西,我们想用“ABC”替换,然后在 SELECT 查询中使用 REGEXP 和 IF() 条件,我们可以实现这一点.
Syntax:
句法:
SELECT IF(column_name REGEXP 'ABC[0-9]$','ABC',column_name)
FROM table1
WHERE column_name LIKE 'ABC%';
Example:
例子:
SELECT IF('ABC1' REGEXP 'ABC[0-9]$','ABC','ABC1');