Python 使用 re.match 过滤字符串列表时失败

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34117950/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 14:28:55  来源:igfitidea点击:

Failure when filtering string list with re.match

pythonregex

提问by miluz

I'd like to filter a list of strings in python by using regex. In the following case, keeping only the files with a '.npy' extension.

我想使用正则表达式过滤 python 中的字符串列表。在以下情况下,仅保留具有“.npy”扩展名的文件。

The code that doesn't work:

不起作用的代码:

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_x\d+_y\d+\.npy')

selected_files = filter(regex.match, files)
print(selected_files)

The same regex works for me in Ruby:

相同的正则表达式在 Ruby 中对我有用:

selected = files.select { |f| f =~ /_x\d+_y\d+\.npy/ }

What's wrong with the Python code?

Python 代码有什么问题?

采纳答案by Kevin Guan

selected_files = filter(regex.match, files)

re.match('regex')is equal to re.search('^regex')or text.startswith('regex')but regex version. It only check if the string starts with the regex.

re.match('regex')等于re.search('^regex')text.startswith('regex')但是正则表达式版本。它只检查字符串是否以 regex 开头

So, use re.search()instead:

因此,请re.search()改用:

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_x\d+_y\d+\.npy')

selected_files = list(filter(regex.search, files))
# The list call is only required in Python 3, since filter was changed to return a generator
print(selected_files)

Output:

输出:

['/a/b/c/la_seg_x005_y003.npy',
 '/a/b/c/la_seg_x004_y003.npy',
 '/a/b/c/la_seg_x003_y003.npy']


And if you just want to get all of the .npyfiles, just use str.endswith():

如果您只想获取所有.npy文件,只需使用str.endswith()

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]


selected_files = list(filter(lambda x: x.endswith('.npy'), files))

print(selected_files)

回答by SIslam

Just use search- since match starts matching from the beginning to end (i.e. entire) of string and search matches anywhere in the string.

只需使用search- 因为匹配从字符串的开头到结尾(即整个)开始匹配,并在字符串中的任何位置搜索匹配项。

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_x\d+_y\d+\.npy')

selected_files = filter(regex.search, files)
print(selected_files)

Output-

输出-

['/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.npy']

回答by miku

If you match, the pattern must cover the entireinput. Either extend you regular expression:

如果匹配,则模式必须覆盖整个输入。要么扩展你的正则表达式:

regex = re.compile(r'.*_x\d+_y\d+\.npy')

Which would match:

哪个匹配:

['/a/b/c/la_seg_x005_y003.npy',
 '/a/b/c/la_seg_x004_y003.npy',
 '/a/b/c/la_seg_x003_y003.npy']

Or use re.search, which

或者使用re.search,其中

scans through string looking for the first locationwhere the regular expression pattern produces a match [...]

扫描字符串以查找正则表达式模式产生匹配的第一个位置[...]

回答by Vlad

re.match()looks for a match at the beginning of the string. You can use re.search()instead.

re.match()在字符串的开头查找匹配项。你可以re.search()改用。