Python 使用 re.match 过滤字符串列表时失败
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34117950/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Failure when filtering string list with re.match
提问by miluz
I'd like to filter a list of strings in python by using regex. In the following case, keeping only the files with a '.npy' extension.
我想使用正则表达式过滤 python 中的字符串列表。在以下情况下,仅保留具有“.npy”扩展名的文件。
The code that doesn't work:
不起作用的代码:
import re
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
regex = re.compile(r'_x\d+_y\d+\.npy')
selected_files = filter(regex.match, files)
print(selected_files)
The same regex works for me in Ruby:
相同的正则表达式在 Ruby 中对我有用:
selected = files.select { |f| f =~ /_x\d+_y\d+\.npy/ }
What's wrong with the Python code?
Python 代码有什么问题?
采纳答案by Kevin Guan
selected_files = filter(regex.match, files)
re.match('regex')
is equal to re.search('^regex')
or text.startswith('regex')
but regex version. It only check if the string starts with the regex.
re.match('regex')
等于re.search('^regex')
或text.startswith('regex')
但是正则表达式版本。它只检查字符串是否以 regex 开头。
So, use re.search()
instead:
因此,请re.search()
改用:
import re
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
regex = re.compile(r'_x\d+_y\d+\.npy')
selected_files = list(filter(regex.search, files))
# The list call is only required in Python 3, since filter was changed to return a generator
print(selected_files)
Output:
输出:
['/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.npy']
And if you just want to get all of the .npy
files, just use str.endswith()
:
如果您只想获取所有.npy
文件,只需使用str.endswith()
:
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
selected_files = list(filter(lambda x: x.endswith('.npy'), files))
print(selected_files)
回答by SIslam
Just use search
- since match starts matching from the beginning to end (i.e. entire) of string and search matches anywhere in the string.
只需使用search
- 因为匹配从字符串的开头到结尾(即整个)开始匹配,并在字符串中的任何位置搜索匹配项。
import re
files = [ '/a/b/c/la_seg_x005_y003.png',
'/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.png',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.png',
'/a/b/c/la_seg_x003_y003.npy', ]
regex = re.compile(r'_x\d+_y\d+\.npy')
selected_files = filter(regex.search, files)
print(selected_files)
Output-
输出-
['/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.npy']
回答by miku
If you match, the pattern must cover the entireinput. Either extend you regular expression:
如果匹配,则模式必须覆盖整个输入。要么扩展你的正则表达式:
regex = re.compile(r'.*_x\d+_y\d+\.npy')
Which would match:
哪个匹配:
['/a/b/c/la_seg_x005_y003.npy',
'/a/b/c/la_seg_x004_y003.npy',
'/a/b/c/la_seg_x003_y003.npy']
Or use re.search, which
或者使用re.search,其中
scans through string looking for the first locationwhere the regular expression pattern produces a match [...]
扫描字符串以查找正则表达式模式产生匹配的第一个位置[...]
回答by Vlad
re.match()
looks for a match at the beginning of the string. You can use re.search()
instead.
re.match()
在字符串的开头查找匹配项。你可以re.search()
改用。