正则表达式在文本 PostgreSQL 中只找到 6 位数字
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14503267/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Regexp to find only 6 digit number in text PostgreSQL
提问by CristisS
I need to build a query in PostgreSQL and am required to find all text entries that contain a 6 digit number (e.g. 000999
, 019290
, 998981
, 234567
, etc). The problem is that the number is not necessary at the begining of the string or at its end.
我需要建立在PostgreSQL的查询和我需要查找包含6位数字的所有文字条目(例如000999
,019290
,998981
,234567
,等)。问题是在字符串的开头或结尾不需要数字。
I tried and didn't work:
我试过但没有用:
[0-9]{6}
- returns part of a number with more than 6 digits(?:(?<!\d)\d{6}(?!\d))
- postgresql does not know about lookbehind[^0-9][0-9]{6}[^0-9]
and variations on it, but to no avail.
[0-9]{6}
- 返回超过 6 位数字的一部分(?:(?<!\d)\d{6}(?!\d))
- postgresql 不知道lookbehind[^0-9][0-9]{6}[^0-9]
和它的变化,但无济于事。
Building my own Perl/C function is not really an option as I do not have the skills required. Any idea what regexp could be used or other tricks that elude me at the moment?
构建我自己的 Perl/C 函数并不是一个真正的选择,因为我没有所需的技能。知道现在可以使用什么正则表达式或其他技巧吗?
EDIT
编辑
Input samples:
输入样本:
aa 0011527 /CASA
-> should return NOTHINGaa 001152/CASA
-> should return001152
aa001152/CASA
-> should return001152
aa0011527/CASA
-> should return NOTHINGaa001152 /CASA
-> should return001152
aa 0011527 /CASA
-> 应该什么都不返回aa 001152/CASA
-> 应该返回001152
aa001152/CASA
-> 应该返回001152
aa0011527/CASA
-> 应该什么都不返回aa001152 /CASA
-> 应该返回001152
回答by h2ooooooo
If PostgreSQL supports word boundaries, use \b
:
如果 PostgreSQL 支持单词边界,请使用\b
:
\b(\d{6})\b
Edit:
编辑:
\b
in PostgreSQL means backspace
, so it's not a word boundary.
\b
在 PostgreSQL 中的意思是backspace
,所以它不是一个词的边界。
http://www.postgresql.org/docs/8.3/interactive/functions-matching.html#FUNCTIONS-POSIX-REGEXPhowever, will explain you that you can use \y
as a word boundary, as it means matches only at the beginning or end of a word
, so
http://www.postgresql.org/docs/8.3/interactive/functions-matching.html#FUNCTIONS-POSIX-REGEXP但是,会向您解释您可以\y
用作单词边界,因为它的意思是matches only at the beginning or end of a word
,所以
\y(\d{6})\y
should work.
应该管用。
\m(\d{6})\M
shouldalso work.
也应该工作。
Full list of word matches in PostgreSQL regex:
PostgreSQL 正则表达式中单词匹配的完整列表:
Escape Description
\A matches only at the beginning of the string (see Section 9.7.3.5 for how this differs from ^)
\m matches only at the beginning of a word
\M matches only at the end of a word
\y matches only at the beginning or end of a word
\Y matches only at a point that is not the beginning or end of a word
\Z matches only at the end of the string (see Section 9.7.3.5 for how this differs from $)
New edit:
新编辑:
Based on your edit, you should be able to do this:
根据您的编辑,您应该能够做到这一点:
(^|[^\d])(\d+)([^\d]|$)
回答by CristisS
Using what @h2ooooooo proposed I managed to create the following query:
使用@h2oooooooo 的建议,我设法创建了以下查询:
SELECT cleantwo."ID",cleantwo."Name",cleantwo."Code"
FROM
(
SELECT cleanone."ID",cleanone."Name",unnest(cleanone."Code") as "Code" -- 3. unnest all the entries received using regexp_matches (get all combinations)
FROM
(
SELECT sd."ID", sd."Name", regexp_matches(sd."Name", '(^|[^\d])(\d+)([^\d]|$)')
as "Code"
FROM "T_SOME_DATA" sd
WHERE substring(sd."Name" from 1 for 15) ~('(^|[^\d])(\d+)([^\d]|$)') -- 1. get all data possible
) as cleanone
WHERE cleanone."Code" IS NOT NULL -- 2. get data where code IS NOT NULL (remove useless entries)
) as cleantwo
WHERE length(cleantwo."Code")=6 -- 4. get only the combinations relevant to my initial requirement (codes with length 6)<br/>
It took me a lot of time to find this so I hope it helps someone else in the same situation. Good luck!
我花了很多时间才找到这个,所以我希望它可以帮助处于相同情况的其他人。祝你好运!