如何在 Postgresql 中提取子字符串模式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31381174/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to extract a substring pattern in Postgresql
提问by thenaturalist
I have a column with a lot of inconsistent strings. Some of them contain a substring with a consistent pattern of '2015mmdd_AB_CD_EFG_(text)_(text)_HIJ'
which I would like to extract. I feel this is a cross over case of regexp and a substring command.
我有一列有很多不一致的字符串。其中一些包含'2015mmdd_AB_CD_EFG_(text)_(text)_HIJ'
我想提取的具有一致模式的子字符串。我觉得这是正则表达式和子字符串命令的交叉情况。
My best approach so far has been a rather ugly
到目前为止我最好的方法是一个相当丑陋的
substring(col_name, '........_.._.._..._.+_.+_...')
substring(col_name, '........_.._.._..._.+_.+_...')
which does not end the output as desired, rahter the output is like '(...)_HIJ_blablabla'.
它没有按需要结束输出,而输出就像'(...)_HIJ_blablabla'。
How can I effectively combine pattern recognition and substring selection in this case?
在这种情况下,如何有效地结合模式识别和子串选择?
回答by a_horse_with_no_name
Assuming that 2015mmdd
actually means some kind of "date", so that the real data contains e.g. 20150713
the following will do:
假设这2015mmdd
实际上意味着某种“日期”,以便真实数据包含例如20150713
以下内容:
substring(col_name, '[0-9]{8}_[A-Z]{2}_[A-Z]{2}_[A-Z]{3}_\([a-z]+\)_\([a-z]+\)')
This returns substrings that start with 8 numbers followed by an underscore, followed by two uppercase characters followed by an underscore followed by two uppercase characters, followed by an underscore followed by three uppercase characters, followed by an underscore followed by an opening parentheses followed by at lease one lowercase letter, followed by a closing parentheses, followed by an underscore, followed by an opening parentheses, followed by at least one lowercase character followed by a closing parentheses.
这将返回以 8 个数字开头的子字符串,后跟一个下划线,后跟两个大写字符,后跟一个下划线,后跟两个大写字符,后跟一个下划线,后跟三个大写字符,后跟一个下划线,后跟一个左括号,后跟 at租用一个小写字母,后跟一个右括号,后跟一个下划线,后跟一个左括号,后跟至少一个小写字符,后跟一个右括号。
If 2015mmdd
indeed means 2015
followed by the string mmdd
then you need this:
如果2015mmdd
确实意味着2015
后跟字符串,mmdd
那么你需要这个:
substring(col_name, '[0-9]{4}mmdd_[A-Z]{2}_[A-Z]{2}_[A-Z]{3}_\([a-z]+\)_\([a-z]+\)')