如何在 MySQL 中进行正则表达式替换？

Question

提问by Piskvor left the building

I have a table with ~500k rows; varchar(255) UTF8 column filenamecontains a file name;

我有一个约 50 万行的表；varchar(255) UTF8 列filename包含文件名；

I'm trying to strip out various strange characters out of the filename - thought I'd use a character class: [^a-zA-Z0-9()_ .\-]

我试图从文件名中去除各种奇怪的字符 - 以为我会使用字符类： [^a-zA-Z0-9()_ .\-]

Now, is there a function in MySQL that lets you replace through a regular expression? I'm looking for a similar functionality to REPLACE() function - simplified example follows:

现在，MySQL 中是否有一个函数可以让您通过正则表达式进行替换？我正在寻找与 REPLACE() 函数类似的功能 - 简化示例如下：

SELECT REPLACE('stackowerflow', 'ower', 'over');

Output: "stackoverflow"

/* does something like this exist? */
SELECT X_REG_REPLACE('Stackoverflow','/[A-Zf]/','-'); 

Output: "-tackover-low"

I know about REGEXP/RLIKE, but those only check ifthere is a match, not whatthe match is.

我知道REGEXP/RLIKE，但那些只检查是否有匹配，而不是匹配是什么。

(I coulddo a "SELECT pkey_id,filename FROM foo WHERE filename RLIKE '[^a-zA-Z0-9()_ .\-]'" from a PHP script, do a preg_replaceand then "UPDATE foo ... WHERE pkey_id=...", but that looks like a last-resort slow & ugly hack)

（我可以做一个“ SELECT pkey_id,filename FROM foo WHERE filename RLIKE '[^a-zA-Z0-9()_ .\-]'”从PHP脚本，做了preg_replace，然后“ UPDATE foo ... WHERE pkey_id=...”，但看起来像一个不得已的缓慢和丑陋的黑客）

Answer 1

采纳答案by Lukasz Szozda

With MySQL 8.0+you could use natively REGEXP_REPLACEfunction.

使用MySQL 8.0+，您可以使用本机REGEXP_REPLACE功能。

12.5.2 Regular Expressions:

12.5.2 正则表达式：

REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])
Replaces occurrences in the string exprthat match the regular expression specified by the pattern patwith the replacement string repl, and returns the resulting string. If expr, pat, or replis NULL, the return value is NULL.

REGEXP_REPLACE(expr, pat, repl[, pos[, occurrence[, match_type]]])
用替换字符串repl替换字符串expr中与模式pat指定的正则表达式匹配的匹配项，并返回结果字符串。如果expr、pat或repl是，则返回值是。NULLNULL

and Regular expression support:

和正则表达式支持：

Previously, MySQLused the Henry Spencer regular expression library to support regular expression operators (REGEXP, RLIKE).
Regular expression support has been reimplemented using International Components for Unicode (ICU), which provides full Unicode support and is multibyte safe. The REGEXP_LIKE()function performs regular expression matching in the manner of the REGEXPand RLIKEoperators, which now are synonyms for that function. In addition, theREGEXP_INSTR(),REGEXP_REPLACE(), andREGEXP_SUBSTR()functions are available to find match positions and perform substring substitution and extraction, respectively.

以前，MySQL使用 Henry Spencer 正则表达式库来支持正则表达式运算符 ( REGEXP, RLIKE)。
使用 International Components for Unicode (ICU) 重新实现了正则表达式支持，该组件提供完整的 Unicode 支持并且是多字节安全的。该REGEXP_LIKE()函数以REGEXPandRLIKE运算符的方式执行正则表达式匹配，现在它们是该函数的同义词。此外，REGEXP_INSTR()，REGEXP_REPLACE()，和REGEXP_SUBSTR()功能可用于找到匹配的位置，并执行串分别取代和提取。

SELECT REGEXP_REPLACE('Stackoverflow','[A-Zf]','-',1,0,'c'); 
-- Output:
-tackover-low

DBFiddle Demo

DBFiddle 演示

Answer 2

回答by Jeremy Stein

MySQL 8.0+:

MySQL 8.0+：

You can use the native REGEXP_REPLACEfunction.

您可以使用本机REGEXP_REPLACE功能。

Older versions:

旧版本：

You can use a user-defined function (UDF) like mysql-udf-regexp.

您可以使用像mysql-udf-regexp这样的用户定义函数 ( UDF) 。

Answer 3

回答by Benvorth

Use MariaDB instead. It has a function

请改用 MariaDB。它有一个功能

REGEXP_REPLACE(col, regexp, replace)

See MariaDB docsand PCRE Regular expression enhancements

查看MariaDB 文档和PCRE 正则表达式增强

Note that you can use regexp grouping as well (I found that very useful):

请注意，您也可以使用正则表达式分组（我发现这非常有用）：

SELECT REGEXP_REPLACE("stackoverflow", "(stack)(over)(flow)", '\2 - \1 - \3')

returns

返回

over - stack - flow

Answer 4

回答by Ryan Ward

My brute force method to get this to work was just:

我让这个工作的蛮力方法只是：

Dump the table - mysqldump -u user -p database table > dump.sql
Find and replace a couple patterns - find /path/to/dump.sql -type f -exec sed -i 's/old_string/new_string/g' {} \;, There are obviously other perl regeular expressions you could perform on the file as well.
Import the table - mysqlimport -u user -p database table < dump.sql

倾倒桌子—— mysqldump -u user -p database table > dump.sql
查找并替换几个模式 - find /path/to/dump.sql -type f -exec sed -i 's/old_string/new_string/g' {} \;，显然您也可以对文件执行其他 perl 正则表达式。
导入表 - mysqlimport -u user -p database table < dump.sql

If you want to make sure the string isn't elsewhere in your dataset, run a few regular expressions to make sure they all occur in a similar environment. It's also not that tough to create a backup before you run a replace, in case you accidentally destroy something that loses depth of information.

如果要确保字符串不在数据集中的其他位置，请运行一些正则表达式以确保它们都出现在类似的环境中。在运行替换之前创建备份也不是那么困难，以防您不小心破坏了丢失信息深度的东西。

Answer 5

回答by rasika godawatte

I recently wrote a MySQL function to replace strings using regular expressions. You could find my post at the following location:

我最近编写了一个 MySQL 函数来使用正则表达式替换字符串。您可以在以下位置找到我的帖子：

http://techras.wordpress.com/2011/06/02/regex-replace-for-mysql/

Here is the function code:

下面是函数代码：

DELIMITER $$

CREATE FUNCTION  `regex_replace`(pattern VARCHAR(1000),replacement VARCHAR(1000),original VARCHAR(1000))
RETURNS VARCHAR(1000)
DETERMINISTIC
BEGIN 
 DECLARE temp VARCHAR(1000); 
 DECLARE ch VARCHAR(1); 
 DECLARE i INT;
 SET i = 1;
 SET temp = '';
 IF original REGEXP pattern THEN 
  loop_label: LOOP 
   IF i>CHAR_LENGTH(original) THEN
    LEAVE loop_label;  
   END IF;
   SET ch = SUBSTRING(original,i,1);
   IF NOT ch REGEXP pattern THEN
    SET temp = CONCAT(temp,ch);
   ELSE
    SET temp = CONCAT(temp,replacement);
   END IF;
   SET i=i+1;
  END LOOP;
 ELSE
  SET temp = original;
 END IF;
 RETURN temp;
END$$

DELIMITER ;

Example execution:

示例执行：

mysql> select regex_replace('[^a-zA-Z0-9\-]','','2my test3_text-to. check \ my- sql (regular) ,expressions ._,');

Answer 6

回答by Jay Patel

we solve this problem without using regex this query replace only exact match string.

我们在不使用正则表达式的情况下解决了这个问题，这个查询只替换了完全匹配的字符串。

update employee set
employee_firstname = 
trim(REPLACE(concat(" ",employee_firstname," "),' jay ',' abc '))

Example:

例子：

emp_id employee_firstname
1 jay
2 jay ajay
3 jay

emp_id 员工名字
1 杰
2 杰杰
3 杰

After executing query result:

执行查询结果后：

emp_id employee_firstname
1 abc
2 abc ajay
3 abc

emp_id 员工名字
1 ABC
2 abc 杰伊
3 ABC

Answer 7

回答by dotancohen

I'm happy to report that since this question was asked, now there is a satisfactory answer! Take a look at this terrific package:

我很高兴地报告，既然有人问了这个问题，现在有一个满意的答案！看看这个很棒的包：

https://github.com/mysqludf/lib_mysqludf_preg

Sample SQL:

示例 SQL：

SELECT PREG_REPLACE('/(.*?)(fox)/' , 'dog' , 'the quick brown fox' ) AS demo;

I found the package from this blog postas linked on this question.

我从这篇博客文章中找到了这个问题的链接。

Answer 8

回答by Steve Chambers

UPDATE 2:A useful set of regex functions including REGEXP_REPLACEhave now been provided in MySQL 8.0. This renders reading on unnecessary unless you're constrained to using an earlier version.

更新 2：MySQL 8.0 中现在提供了一组有用的正则表达式函数，包括REGEXP_REPLACE。这使得阅读变得不必要，除非您被限制使用早期版本。

UPDATE 1:Have now made this into a blog post: http://stevettt.blogspot.co.uk/2018/02/a-mysql-regular-expression-replace.html

更新 1：现在已将其制成博客文章：http: //stevettt.blogspot.co.uk/2018/02/a-mysql-regular-expression-replace.html

The following expands upon the function provided by Rasika Godawattebut trawls through all necessary substrings rather than just testing single characters:

以下扩展了 Rasika Godawatte 提供的功能，但会遍历所有必要的子字符串，而不仅仅是测试单个字符：

-- ------------------------------------------------------------------------------------
-- USAGE
-- ------------------------------------------------------------------------------------
-- SELECT reg_replace(<subject>,
--                    <pattern>,
--                    <replacement>,
--                    <greedy>,
--                    <minMatchLen>,
--                    <maxMatchLen>);
-- where:
-- <subject> is the string to look in for doing the replacements
-- <pattern> is the regular expression to match against
-- <replacement> is the replacement string
-- <greedy> is TRUE for greedy matching or FALSE for non-greedy matching
-- <minMatchLen> specifies the minimum match length
-- <maxMatchLen> specifies the maximum match length
-- (minMatchLen and maxMatchLen are used to improve efficiency but are
--  optional and can be set to 0 or NULL if not known/required)
-- Example:
-- SELECT reg_replace(txt, '^[Tt][^ ]* ', 'a', TRUE, 2, 0) FROM tbl;
DROP FUNCTION IF EXISTS reg_replace;
DELIMITER //
CREATE FUNCTION reg_replace(subject VARCHAR(21845), pattern VARCHAR(21845),
  replacement VARCHAR(21845), greedy BOOLEAN, minMatchLen INT, maxMatchLen INT)
RETURNS VARCHAR(21845) DETERMINISTIC BEGIN 
  DECLARE result, subStr, usePattern VARCHAR(21845); 
  DECLARE startPos, prevStartPos, startInc, len, lenInc INT;
  IF subject REGEXP pattern THEN
    SET result = '';
    -- Sanitize input parameter values
    SET minMatchLen = IF(minMatchLen < 1, 1, minMatchLen);
    SET maxMatchLen = IF(maxMatchLen < 1 OR maxMatchLen > CHAR_LENGTH(subject),
                         CHAR_LENGTH(subject), maxMatchLen);
    -- Set the pattern to use to match an entire string rather than part of a string
    SET usePattern = IF (LEFT(pattern, 1) = '^', pattern, CONCAT('^', pattern));
    SET usePattern = IF (RIGHT(pattern, 1) = '$', usePattern, CONCAT(usePattern, '$'));
    -- Set start position to 1 if pattern starts with ^ or doesn't end with $.
    IF LEFT(pattern, 1) = '^' OR RIGHT(pattern, 1) <> '$' THEN
      SET startPos = 1, startInc = 1;
    -- Otherwise (i.e. pattern ends with $ but doesn't start with ^): Set start pos
    -- to the min or max match length from the end (depending on "greedy" flag).
    ELSEIF greedy THEN
      SET startPos = CHAR_LENGTH(subject) - maxMatchLen + 1, startInc = 1;
    ELSE
      SET startPos = CHAR_LENGTH(subject) - minMatchLen + 1, startInc = -1;
    END IF;
    WHILE startPos >= 1 AND startPos <= CHAR_LENGTH(subject)
      AND startPos + minMatchLen - 1 <= CHAR_LENGTH(subject)
      AND !(LEFT(pattern, 1) = '^' AND startPos <> 1)
      AND !(RIGHT(pattern, 1) = '$'
            AND startPos + maxMatchLen - 1 < CHAR_LENGTH(subject)) DO
      -- Set start length to maximum if matching greedily or pattern ends with $.
      -- Otherwise set starting length to the minimum match length.
      IF greedy OR RIGHT(pattern, 1) = '$' THEN
        SET len = LEAST(CHAR_LENGTH(subject) - startPos + 1, maxMatchLen), lenInc = -1;
      ELSE
        SET len = minMatchLen, lenInc = 1;
      END IF;
      SET prevStartPos = startPos;
      lenLoop: WHILE len >= 1 AND len <= maxMatchLen
                 AND startPos + len - 1 <= CHAR_LENGTH(subject)
                 AND !(RIGHT(pattern, 1) = '$' 
                       AND startPos + len - 1 <> CHAR_LENGTH(subject)) DO
        SET subStr = SUBSTRING(subject, startPos, len);
        IF subStr REGEXP usePattern THEN
          SET result = IF(startInc = 1,
                          CONCAT(result, replacement), CONCAT(replacement, result));
          SET startPos = startPos + startInc * len;
          LEAVE lenLoop;
        END IF;
        SET len = len + lenInc;
      END WHILE;
      IF (startPos = prevStartPos) THEN
        SET result = IF(startInc = 1, CONCAT(result, SUBSTRING(subject, startPos, 1)),
                        CONCAT(SUBSTRING(subject, startPos, 1), result));
        SET startPos = startPos + startInc;
      END IF;
    END WHILE;
    IF startInc = 1 AND startPos <= CHAR_LENGTH(subject) THEN
      SET result = CONCAT(result, RIGHT(subject, CHAR_LENGTH(subject) + 1 - startPos));
    ELSEIF startInc = -1 AND startPos >= 1 THEN
      SET result = CONCAT(LEFT(subject, startPos), result);
    END IF;
  ELSE
    SET result = subject;
  END IF;
  RETURN result;
END//
DELIMITER ;

Demo

演示

Rextester Demo

雷克斯特演示

Limitations

限制

This method is of course going to take a while when the subject string is large. Update:Have now added minimum and maximum match length parameters for improved efficiency when these are known (zero = unknown/unlimited).
It won'tallow substitution of backreferences (e.g. \1, \2etc.) to replace capturing groups. If this functionality is needed, please see this answerwhich attempts to provide a workaround by updating the function to allow a secondary find and replace within each found match (at the expense of increased complexity).
If ^and/or $is used in the pattern, they must be at the very start and very end respectively - e.g. patterns such as (^start|end$)are not supported.
There is a "greedy" flag to specify whether the overall matching should be greedy or non-greedy. Combining greedy and lazy matching within a single regular expression (e.g. a.*?b.*) is not supported.

当主题字符串很大时，此方法当然需要一段时间。更新：现在添加了最小和最大匹配长度参数，以在已知（零 = 未知/无限制）时提高效率。
它不会允许反向引用（例如取代\1，\2等等）来替换捕获组。如果需要此功能，请参阅此答案，该答案试图通过更新函数以允许在每个找到的匹配项中进行辅助查找和替换（以增加复杂性为代价）来提供解决方法。
如果^和/或$在模式中使用，它们必须分别位于最开始和最末尾——例如(^start|end$)不支持的模式。
有一个“贪婪”标志来指定整体匹配是贪婪还是非贪婪。a.*?b.*不支持在单个正则表达式（例如）中组合贪婪和惰性匹配。

Usage Examples

使用示例

The function has been used to answer the following StackOverflow questions:

该函数已用于回答以下 StackOverflow 问题：

Answer 9

回答by Eddie B

You 'can' do it ... but it's not very wise ... this is about as daring as I'll try ... as far as full RegEx support your much better off using perl or the like.

你'可以'做到......但这不是很明智......这与我尝试的一样大胆......只要完整的RegEx支持你使用perl或类似的东西会更好。

UPDATE db.tbl
SET column = 
CASE 
WHEN column REGEXP '[[:<:]]WORD_TO_REPLACE[[:>:]]' 
THEN REPLACE(column,'WORD_TO_REPLACE','REPLACEMENT')
END 
WHERE column REGEXP '[[:<:]]WORD_TO_REPLACE[[:>:]]'

Answer 10

回答by user3796869

We can use IF condition in SELECT query as below:

我们可以在 SELECT 查询中使用 IF 条件，如下所示：

Suppose that for anything with "ABC","ABC1","ABC2","ABC3",..., we want to replace with "ABC" then using REGEXP and IF() condition in the SELECT query, we can achieve this.

假设对于任何带有“ABC”、“ABC1”、“ABC2”、“ABC3”、...的东西，我们想用“ABC”替换，然后在 SELECT 查询中使用 REGEXP 和 IF() 条件，我们可以实现这一点.

Syntax:

句法：

SELECT IF(column_name REGEXP 'ABC[0-9]$','ABC',column_name)
FROM table1 
WHERE column_name LIKE 'ABC%';

Example:

例子：

SELECT IF('ABC1' REGEXP 'ABC[0-9]$','ABC','ABC1');

如何在 MySQL 中进行正则表达式替换？

提问by Piskvor left the building

采纳答案by Lukasz Szozda

回答by Jeremy Stein

MySQL 8.0+:

MySQL 8.0+：

Older versions:

旧版本：

回答by Benvorth

回答by Ryan Ward

回答by rasika godawatte

回答by Jay Patel

回答by dotancohen

回答by Steve Chambers

回答by Eddie B

回答by user3796869

相关推荐

最近更新

标签

如何在 MySQL 中进行正则表达式替换？

提问by Piskvor left the building

采纳答案by Lukasz Szozda

回答by Jeremy Stein

MySQL 8.0+:

MySQL 8.0+：

Older versions:

旧版本：

回答by Benvorth

回答by Ryan Ward

回答by rasika godawatte

回答by Jay Patel

回答by dotancohen

回答by Steve Chambers

回答by Eddie B

回答by user3796869

相关推荐

MySQL 计算表中BLOB列的总数据大小

MySQL FULLTEXT 索引问题

来自 SELECT 语句的 MySQL 存储过程变量

MySQL 如何使用 WordPress 进行自定义查询？

相关推荐

最近更新

标签