如何在 Python 中匹配精确的“多个”字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4953272/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 18:19:24  来源:igfitidea点击:

How to match exact "multiple" strings in Python

pythonregex

提问by Neo

I've got a list of exact patterns that I want to search in a given string. Currently I've got a real bad solution for such a problem.

我有一个我想在给定字符串中搜索的确切模式列表。目前,我对这样的问题有一个非常糟糕的解决方案。

pat1 = re.compile('foo.tralingString')
mat1 = pat1.match(mystring)

pat2 = re.compile('bar.trailingString')
mat2 = pat2.match(mystring)

if mat1 or mat2:
    # Do whatever

pat = re.compile('[foo|bar].tralingString')
match = pat.match(mystring) # Doesn't work

The only condition is that I've got a list of strings which are to be matched exactly. Whats the best possible solution in Python.

唯一的条件是我有一个要完全匹配的字符串列表。什么是 Python 中最好的解决方案。

EDIT: The search patterns have some trailing patterns common.

编辑:搜索模式有一些常见的尾随模式。

采纳答案by ircmaxell

You could do a trivial regex that combines those two:

你可以做一个简单的正则表达式,结合这两者:

pat = re.compile('foo|bar')
if pat.match(mystring):
    # Do whatever

You could then expand the regex to do whatever you need to, using the |separator (which means orin regex syntax)

然后,您可以使用|分隔符(这意味着在正则表达式语法中)扩展正则表达式以执行您需要的任何操作

Edit:Based upon your recent edit, this should do it for you:

编辑:根据您最近的编辑,这应该为您做:

pat = re.compile('(foo|bar)\.trailingString');
if pat.match(mystring):
    # Do Whatever

The []is a character class. So your [foo|bar]would match a string with oneof the included characters (since there's no * or + or ? after the class). ()is the enclosure for a sub-pattern.

[]是字符类。因此,您[foo|bar]将匹配一个字符串与其中一个包含的字符(因为类之后没有 * 或 + 或 ? )。 ()是子模式的外壳。

回答by BoltClock

You're right in using |but you're using a character class []instead of a subpattern (). Try this regex:

您使用正确,|但您使用的是字符类[]而不是子模式()。试试这个正则表达式:

r = re.compile('(?:foo|bar)\.trailingString')

if r.match(mystring):
    # Do stuff


Old answer

旧答案

If you want to do exact substring matches you shouldn't use regex.

如果要进行精确的子字符串匹配,则不应使用正则表达式。

Try using ininstead:

尝试使用in

words = ['foo', 'bar']

# mystring contains at least one of the words
if any(i in mystring for i in words):
    # Do stuff

回答by dagoof

perhaps

也许

any([re.match(r, mystring) for r in ['bar', 'foo']])

I'm assuming your match patterns will be more complex than foo or bar; if they aren't, just use

我假设您的匹配模式将比 foo 或 bar 更复杂;如果不是,请使用

if mystring in ['bar', 'foo']:

回答by Senthil Kumaran

Use '|'in your regex. It stands for 'OR'. There is better way too, when you want to re.escapeyour strings

使用“|” 在你的正则表达式中。它代表“或”。还有更好的方法,当你想重新转义你的字符串时

pat = re.compile('|'.join(map(re.escape, ['foo.tralingString','bar.tralingString','something.else'])))

回答by Hugh Bothwell

Do you want to search for patternsor strings? The best solution for each is very different:

您要搜索模式还是字符串?每个人的最佳解决方案是非常不同的:

# strings
patterns = ['foo', 'bar', 'baz']
matches = set(patterns)

if mystring in matches:     # O(1) - very fast
    # do whatever


# patterns
import re
patterns = ['foo', 'bar']
matches = [re.compile(pat) for pat in patterns]

if any(m.match(mystring) for m in matches):    # O(n)
    # do whatever

Edit:Ok, you want to search on variable-length exact strings at the beginning of a search string; try

编辑:好的,您想在搜索字符串的开头搜索可变长度的精确字符串;尝试

from collections import defaultdict
matches = defaultdict(set)

patterns = ['foo', 'barr', 'bazzz']
for p in patterns:
    matches[len(p)].add(p)

for strlen,pats in matches.iteritems():
    if mystring[:strlen] in pats:
        # do whatever
        break