MySQL SQL 匹配特殊字符正则表达式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14604736/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 16:21:02  来源:igfitidea点击:

SQL Match Special Characters Regexp

mysqlsqldatabase

提问by saricden

I'm looking for an SQL statement that will return only rows of my table whose Namefield contains special characters (excluding underscores).

我正在寻找一个 SQL 语句,它将只返回我的表中Name字段包含特殊字符(不包括下划线)的行。

I've tried:

我试过了:

SELECT * FROM 'table' WHERE Name REGEXP '^[!#$%&()*+,\-./:;<=>?@[\\]^`{|}~]+$'

But no dice, this returns an empty result set (despite there being rows I specifically added with Name fields containing %, $, and #characters).

但是没有骰子,这将返回一个空的结果集(尽管我专门添加了包含%, $, and #字符的Name 字段的行)。

采纳答案by ppeterka

The first problem seems to be is the ^and $signs (Mike C summarized it quicker than I did why...)

第一个问题似乎是^$符号(Mike C 总结得比我快,为什么......)

But I see escaping problems too: all special characters that mean something in regexp should be escapedspecially placed in the [], so [, ], ^, -

但我也看到了转义问题:所有在正则表达式中具有意义的特殊字符都应该是 逃脱专门放在[], 所以[, ], ^,-

Here is a question about how to escape special characters inside character groups in MySQL regexes.

这是一个关于如何在 MySQL 正则表达式中转义字符组内的特殊字符的问题。

Conclusion detailed in the regex documentation:

正则表达式文档中详述的结论:

A bracket expression is a list of characters enclosed in '[]'. It normally matches any single character from the list (but see below).

  • If the list begins with '^', it matches any single character (but see below) not from the rest of the list.

  • If two characters in the list are separated by '-', this is shorthand for the full range of characters between those two (inclusive) in the collating sequence, e.g. '[0-9]' in ASCII matches any decimal digit.

  • It is illegal(!) for two ranges to share an endpoint, e.g. 'a-c-e'. Ranges are very collating sequence-dependent, and portable programs should avoid relying on them.

  • To include a literal ']' in the list, make it the first character (following a possible '^').

  • To include a literal '-', make it the first orlast character, or the second endpoint of a range.

  • To use a literal '-' as the first endpoint of a range, enclose it in '[.' and '.]' to make it a collating element (see below).

With the exception of these and some combinations using '[' (see next paragraphs), all other special characters, including '\', lose their special significance within a bracket expression.

括号表达式是包含在 '[]' 中的字符列表。它通常匹配列表中的任何单个字符(但见下文)。

  • 如果列表以 '^' 开头,则它匹配不来自列表其余部分的任何单个字符(但见下文)。

  • 如果列表中的两个字符由“-”分隔,则这是整理序列中这两个字符(包括)之间的全部字符范围的简写,例如 ASCII 中的“[0-9]”匹配任何十进制数字。

  • 两个范围共享一个端点是非法的(!),例如“ac-e”。范围非常依赖于整理顺序,可移植程序应避免依赖它们。

  • 要在列表中包含文字“]”,请将其设为第一个字符(在可能的“^”之后)。

  • 要包含文字“-”,请将其设为第一个或最后一个字符,或范围的第二个端点。

  • 要将文字“-”用作范围的第一个端点,请将其括在“[.”中。和 '.]' 使其成为整理元素(见下文)。

除了这些和一些使用 '[' 的组合(见下一段)之外,所有其他特殊字符,包括 '\',在括号表达式中都失去了它们的特殊意义

EDITHere is an SQL fiddleabout some interesting regexes regarding the ]character

编辑这是一个关于]角色的有趣正则表达式的 SQL 小提琴

DDL:create table txt ( txt varchar(200) );

DDL:创建表 txt ( txt varchar(200) );

insert into txt values ('ab[]cde');
insert into txt values ('ab[cde');
insert into txt values ('ab]cde');
insert into txt values ('ab[]]]]cde');
insert into txt values ('ab[[[[]cde');
insert into txt values ('ab\]]]]cde');
insert into txt values ('ab[wut?wut?]cde');

Queries:

查询:

Naive approach to match a group of [and ]chars. Syntactically OK, but the group is the single [char, and it matches multiple ]chars afterwards.

匹配一组[]字符的幼稚方法。语法上没问题,但组是单个[字符,然后匹配多个]字符。

SELECT * FROM txt WHERE txt 
REGEXP 'ab[[]]+cde';

Escaped -> same ???

转义 -> 相同???

SELECT * FROM txt WHERE txt 
REGEXP 'ab[[\]]+cde';

Double escape -> doesn't work, group is now a [and a \

双转义 -> 不起作用,组现在是一个[和一个\

SELECT * FROM txt WHERE txt 
REGEXP 'ab[[\]]+cde';

Swapping the closing bracket with the opening one inside the group. This is the weirdest regex I ever wrote - to this point...

将右括号与组内的左括号交换。这是我写过的最奇怪的正则表达式 - 到目前为止......

SELECT * FROM txt WHERE txt 
REGEXP 'ab[][]+cde';

I will get killed by such a (totally valid!) regex in a weird nightmare, I think:

我会在一个奇怪的噩梦中被这样一个(完全有效!)正则表达式杀死,我想:

SELECT * FROM txt WHERE txt 
REGEXP 'ab[]wut?[]+cde';

回答by Mike C

This regex should match names that ONLY contain special characters. You specify the carat (^) which signifies the start of the string, your character class with your list of special characters, the plus sign (+) to indicate one or more, and then the dollar to signify the end of the string. You need to account for non-special character in the string. You could try something like this:

此正则表达式应匹配仅包含特殊字符的名称。您指定克拉 (^) 表示字符串的开头,您的字符类与您的特殊字符列表,加号 (+) 表示一个或多个,然后美元表示字符串的结尾。您需要考虑字符串中的非特殊字符。你可以尝试这样的事情:

WHERE Name REGEXP '^.*?[!#$%&()*+,\-./:;<=>?@[\\]^`{|}~]+.*?$'

I added the .*? at the beginning and end to allow for non-special characters before and after the special character. BTW, you probably don't need the (+) any more, since one special would be enough for a match.

我添加了 .*? 在开头和结尾允许在特殊字符之前和之后的非特殊字符。顺便说一句,您可能不再需要 (+) 了,因为一个特殊的就足以进行一场比赛了。