Python 正则表达式错误字符范围。
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28539253/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python regex bad character range.
提问by user3116355
I am using the following regex to match different patterns of dates. it works fine in regex101.com. But when I import to python i am getting "bad character range" exception.
我正在使用以下正则表达式来匹配不同的日期模式。它在 regex101.com 中运行良好。但是当我导入到 python 时,我得到了“坏字符范围”异常。
pattern = ur"((?:\b((?:(january|jan|february|feb|march|mar|april|apr|may|jun|june|july|jul|august|aug|september|set|sep|october|oct|november|nov|december|dec)['\s\.]{0,4}(?:\d{4}|\d{2})|(?:january|jan|february|feb|march|mar|april|apr|may|jun|june|july|jul|august|aug|september|set|sep|october|oct|november|nov|december|dec)|((?:0[1-9]|[1-3][0-9]|[0-9])/(?:0[1-9]|[1-3][0-9])/(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9]))|((?:0[1-9]|1[0-2]|[1-9])\s{0,3}[-/']{1,3}[\s-/']{0,3}(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9])))))(?:(?![\r\n])\s){0,4})[-/–to]{0,2}(?:(?![\r\n])\s){0,4}(((?:january|jan|february|feb|march|mar|april|apr|may|jun|june|july|jul|august|aug|september|set|sep|october|oct|november|nov|december|dec)[-'\s\.]{0,4}(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9])))|((?:0[1-9]|[1-3][0-9]|[1-9])/(?:0[1-9]|[1-3][0-9])/(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9])))|((?:0[1-9]|1[0-2]|[1-9])\s{0,3}[-/']{1,3}[\s-/']{0,3}(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9]))))))"
https://regex101.com/r/rU3cE9/1
采纳答案by Avinash Raj
Problem is mainly because of the hyphen present inside [\s-/']
character class, therefore Python interprets it as a character interval (like in [a-z]
). I suggest you to put the hyphen at the first or at the last position inside the character class [-\s/']
or escape it, to prevent ambiguity.
问题主要是因为[\s-/']
字符类中存在连字符,因此 Python 将其解释为字符间隔(如 in [a-z]
)。我建议您将连字符放在字符类中的第一个或最后一个位置[-\s/']
或将其转义,以防止歧义。
>>> reg = re.compile(ur"((?:\b((?:(january|jan|february|feb|march|mar|april|apr|may|jun|june|july|jul|august|aug|september|set|sep|october|oct|november|nov|december|dec)['\s\.]{0,4}(?:\d{4}|\d{2})|(?:january|jan|february|feb|march|mar|april|apr|may|jun|june|july|jul|august|aug|september|set|sep|october|oct|november|nov|december|dec)|((?:0[1-9]|[1-3][0-9]|[0-9])/(?:0[1-9]|[1-3][0-9])/(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9]))|((?:0[1-9]|1[0-2]|[1-9])\s{0,3}[-/']{1,3}[-\s/']{0,3}(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9])))))(?:(?![\r\n])\s){0,4})[-/to–]{0,2}(?:(?![\r\n])\s){0,4}(((?:january|jan|february|feb|march|mar|april|apr|may|jun|june|july|jul|august|aug|september|set|sep|october|oct|november|nov|december|dec)[-'\s\.]{0,4}(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9])))|((?:0[1-9]|[1-3][0-9]|[1-9])/(?:0[1-9]|[1-3][0-9])/(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9])))|((?:0[1-9]|1[0-2]|[1-9])\s{0,3}[-/']{1,3}[-\s/']{0,3}(?:(19[7-9][0-9])|(20[0-1][0-9])|([7-9][0-9]|[0-1][0-9]))))))")
>>>