Python 使用 re.match 过滤字符串列表时失败

Question

提问by miluz

I'd like to filter a list of strings in python by using regex. In the following case, keeping only the files with a '.npy' extension.

我想使用正则表达式过滤 python 中的字符串列表。在以下情况下，仅保留具有“.npy”扩展名的文件。

The code that doesn't work:

不起作用的代码：

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_x\d+_y\d+\.npy')

selected_files = filter(regex.match, files)
print(selected_files)

The same regex works for me in Ruby:

相同的正则表达式在 Ruby 中对我有用：

selected = files.select { |f| f =~ /_x\d+_y\d+\.npy/ }

What's wrong with the Python code?

Python 代码有什么问题？

Answer 1

采纳答案by Kevin Guan

selected_files = filter(regex.match, files)

re.match('regex')is equal to re.search('^regex')or text.startswith('regex')but regex version. It only check if the string starts with the regex.

re.match('regex')等于re.search('^regex')或text.startswith('regex')但是正则表达式版本。它只检查字符串是否以 regex 开头。

So, use re.search()instead:

因此，请re.search()改用：

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_x\d+_y\d+\.npy')

selected_files = list(filter(regex.search, files))
# The list call is only required in Python 3, since filter was changed to return a generator
print(selected_files)

Output:

输出：

['/a/b/c/la_seg_x005_y003.npy',
 '/a/b/c/la_seg_x004_y003.npy',
 '/a/b/c/la_seg_x003_y003.npy']

And if you just want to get all of the .npyfiles, just use str.endswith():

如果您只想获取所有.npy文件，只需使用str.endswith()：

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]


selected_files = list(filter(lambda x: x.endswith('.npy'), files))

print(selected_files)

Answer 2

回答by SIslam

Just use search- since match starts matching from the beginning to end (i.e. entire) of string and search matches anywhere in the string.

只需使用search- 因为匹配从字符串的开头到结尾（即整个）开始匹配，并在字符串中的任何位置搜索匹配项。

import re

files = [ '/a/b/c/la_seg_x005_y003.png',
          '/a/b/c/la_seg_x005_y003.npy',
          '/a/b/c/la_seg_x004_y003.png',
          '/a/b/c/la_seg_x004_y003.npy',
          '/a/b/c/la_seg_x003_y003.png',
          '/a/b/c/la_seg_x003_y003.npy', ]

regex = re.compile(r'_x\d+_y\d+\.npy')

selected_files = filter(regex.search, files)
print(selected_files)

Output-

输出-

['/a/b/c/la_seg_x005_y003.npy', '/a/b/c/la_seg_x004_y003.npy', '/a/b/c/la_seg_x003_y003.npy']

Answer 3

回答by miku

If you match, the pattern must cover the entireinput. Either extend you regular expression:

如果匹配，则模式必须覆盖整个输入。要么扩展你的正则表达式：

regex = re.compile(r'.*_x\d+_y\d+\.npy')

Which would match:

哪个匹配：

['/a/b/c/la_seg_x005_y003.npy',
 '/a/b/c/la_seg_x004_y003.npy',
 '/a/b/c/la_seg_x003_y003.npy']

Or use re.search, which

或者使用re.search，其中

scans through string looking for the first locationwhere the regular expression pattern produces a match [...]

扫描字符串以查找正则表达式模式产生匹配的第一个位置[...]

Answer 4

回答by Vlad

re.match()looks for a match at the beginning of the string. You can use re.search()instead.

re.match()在字符串的开头查找匹配项。你可以re.search()改用。

Python 使用 re.match 过滤字符串列表时失败

提问by miluz

采纳答案by Kevin Guan

回答by SIslam

回答by miku

回答by Vlad

相关推荐

最近更新

标签

Python 使用 re.match 过滤字符串列表时失败

提问by miluz

采纳答案by Kevin Guan

回答by SIslam

回答by miku

回答by Vlad

相关推荐

Python 将 tkinter 的 intvar 添加到整数

如何分别为python3和python2设置不同的PYTHONPATH变量

在python中覆盖类变量？

Python 在 Flask 中禁用缓存

相关推荐

最近更新

标签