Python 找到所有正则表达式匹配的索引?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3519565/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find the indexes of all regex matches?
提问by xitrium
I'm parsing strings that could have any number of quoted strings inside them (I'm parsing code, and trying to avoid PLY). I want to find out if a substring is quoted, and I have the substrings index. My initial thought was to use re to find all the matches and then figure out the range of indexes they represent.
我正在解析其中可能包含任意数量的带引号的字符串的字符串(我正在解析代码,并试图避免 PLY)。我想知道子字符串是否被引用,并且我有子字符串索引。我最初的想法是使用 re 查找所有匹配项,然后找出它们所代表的索引范围。
It seems like I should use re with a regex like \"[^\"]+\"|'[^']+'(I'm avoiding dealing with triple quoted and such strings at the moment). When I use findall() I get a list of the matching strings, which is somewhat nice, but I need indexes.
似乎我应该将 re 与正则表达式一起使用\"[^\"]+\"|'[^']+'(我目前正在避免处理三重引号和此类字符串)。当我使用 findall() 时,我得到了一个匹配字符串的列表,这有点不错,但我需要索引。
My substring might be as simple as c, and I need to figure out if this particular cis actually quoted or not.
我的子字符串可能像 一样简单c,我需要弄清楚这个特定的字符串c是否真的被引用了。
采纳答案by Dave Kirby
This is what you want: (source)
这就是你想要的:(来源)
re.finditer(pattern, string[, flags])Return an iterator yielding MatchObject instances over all non-overlapping matches for the RE pattern in string. The string is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result unless they touch the beginning of another match.
re.finditer(pattern, string[, flags])返回一个迭代器,在字符串中 RE 模式的所有非重叠匹配上产生 MatchObject 实例。从左到右扫描字符串,并按找到的顺序返回匹配项。空匹配项包含在结果中,除非它们触及另一个匹配项的开头。
You can then get the start and end positions from the MatchObjects.
然后您可以从 MatchObjects 获取开始和结束位置。
e.g.
例如
[(m.start(0), m.end(0)) for m in re.finditer(pattern, string)]
回答by Omkar Rahane
This should solve your issue pattern=r"(?=(\"[^\"]+\"|'[^']+'))"
这应该可以解决您的问题 pattern=r"(?=(\"[^\"]+\"|'[^']+'))"
Then use the following to get all overlapping indices,
然后使用以下内容获取所有重叠索引,
indicesTuple=[(mObj.start(1),mObj.end(1)-1) for mObj in re.finditer(pattern,input)]
indexTuple=[(mObj.start(1),mObj.end(1)-1) for mObj in re.finditer(pattern,input)]

