如何在 Postgresql 中提取子字符串模式

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31381174/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 01:58:08  来源:igfitidea点击:

How to extract a substring pattern in Postgresql

regexpostgresqlsubstringregexp-substr

提问by thenaturalist

I have a column with a lot of inconsistent strings. Some of them contain a substring with a consistent pattern of '2015mmdd_AB_CD_EFG_(text)_(text)_HIJ'which I would like to extract. I feel this is a cross over case of regexp and a substring command.

我有一列有很多不一致的字符串。其中一些包含'2015mmdd_AB_CD_EFG_(text)_(text)_HIJ'我想提取的具有一致模式的子字符串。我觉得这是正则表达式和子字符串命令的交叉情况。

My best approach so far has been a rather ugly

到目前为止我最好的方法是一个相当丑陋的

substring(col_name, '........_.._.._..._.+_.+_...')

substring(col_name, '........_.._.._..._.+_.+_...')

which does not end the output as desired, rahter the output is like '(...)_HIJ_blablabla'.

它没有按需要结束输出,而输出就像'(...)_HIJ_blablabla'。

How can I effectively combine pattern recognition and substring selection in this case?

在这种情况下,如何有效地结合模式识别和子串选择?

回答by a_horse_with_no_name

Assuming that 2015mmddactually means some kind of "date", so that the real data contains e.g. 20150713the following will do:

假设这2015mmdd实际上意味着某种“日期”,以便真实数据包含例如20150713以下内容:

substring(col_name, '[0-9]{8}_[A-Z]{2}_[A-Z]{2}_[A-Z]{3}_\([a-z]+\)_\([a-z]+\)')

This returns substrings that start with 8 numbers followed by an underscore, followed by two uppercase characters followed by an underscore followed by two uppercase characters, followed by an underscore followed by three uppercase characters, followed by an underscore followed by an opening parentheses followed by at lease one lowercase letter, followed by a closing parentheses, followed by an underscore, followed by an opening parentheses, followed by at least one lowercase character followed by a closing parentheses.

这将返回以 8 个数字开头的子字符串,后跟一个下划线,后跟两个大写字符,后跟一个下划线,后跟两个大写字符,后跟一个下划线,后跟三个大写字符,后跟一个下划线,后跟一个左括号,后跟 at租用一个小写字母,后跟一个右括号,后跟一个下划线,后跟一个左括号,后跟至少一个小写字符,后跟一个右括号。

If 2015mmddindeed means 2015followed by the string mmddthen you need this:

如果2015mmdd确实意味着2015后跟字符串,mmdd那么你需要这个:

substring(col_name, '[0-9]{4}mmdd_[A-Z]{2}_[A-Z]{2}_[A-Z]{3}_\([a-z]+\)_\([a-z]+\)')