python正则表达式可选捕获组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15474741/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python regex optional capture group
提问by user2181741
I have the following problem matching the needed data from filenames like this:
我在匹配文件名中所需的数据时遇到以下问题:
miniseries.season 1.part 5.720p.avi
miniseries.part 5.720p.avi
miniseries.part VII.720p.avi # episode or season expressed in Roman numerals
The "season XX" chunk may or may not be present or may be written in short form, like "s 1" or "seas 1"
“第 XX 季”大块可能存在也可能不存在,也可能以简短形式书写,例如“s 1”或“sea 1”
In any case I would like to have 4 capture groups giving as output :
无论如何,我希望有 4 个捕获组作为输出:
group1 : miniseries
group2 : 1 (or None)
group3 : 5
group4 : 720p.avi
So I've written a regex like this :
所以我写了一个这样的正则表达式:
(^.*)\Ws[eason ]*(\d{1,2}|[ivxlcdm]{1,5})\Wp[art ]*(\d{1,2}|[ivxlcdm]{1,5})\W(.*$)
This only works when i have a fully specified filename, including the optional "season XX" string. Is it possible to write a regex that returns "None" as group2 if "season" is not found ?
这只适用于我有一个完全指定的文件名,包括可选的“season XX”字符串。如果找不到“季节”,是否可以编写一个将“无”作为 group2 返回的正则表达式?
采纳答案by Martijn Pieters
It is easy enough to make the season group optional:
很容易让季节组成为可选:
(^.*?)(?:\Ws(?:eason )?(\d{1,2}|[ivxlcdm]{1,5}))?\Wp(?:art )?(\d{1,2}|[ivxlcdm]{1,5})\W(.*$)
using a non-capturing group ((?:...)) plus the 0 or 1 quantifier (?). I did have to make the first group non-greedy to prevent it from matching the seasonsection of the name.
使用非捕获组 ( (?:...)) 加上 0 或 1 量词 ( ?)。我确实必须使第一组不贪婪,以防止它season与名称的部分匹配。
I also made the easonand artoptional strings into non-capturing optional groups instead of character classes.
我还将 theeason和artoptional 字符串变成了非捕获可选组而不是字符类。
Result:
结果:
>>> import re
>>> p=re.compile(r'(^.*?)(?:\Ws(?:eason )?(\d{1,2}|[ivxlcdm]{1,5}))?\Wp(?:art )?(\d{1,2}|[ivxlcdm]{1,5})\W(.*$)', re.I)
>>> p.search('miniseries.season 1.part 5.720p.avi').groups()
('miniseries', '1', '5', '720p.avi')
>>> p.search('miniseries.part 5.720p.avi').groups()
('miniseries', None, '5', '720p.avi')
>>> p.search('miniseries.part VII.720p.avi').groups()
('miniseries', None, 'VII', '720p.avi')

