Python:如何在 if 语句中使用 RegEx?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/14225608/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 10:48:45  来源:igfitidea点击:

Python: How to use RegEx in an if statement?

pythonregex

提问by

I have the following code which looks through the files in one directory and copies files that contain a certain string into another directory, but I am trying to use Regular Expressions as the string could be upper and lowercase or a mix of both.

我有以下代码,它查看一个目录中的文件并将包含某个字符串的文件复制到另一个目录中,但我正在尝试使用正则表达式,因为该字符串可以是大写和小写或两者的混合。

Here is the code that works, before I tried to use RegEx's

在我尝试使用 RegEx 之前,这是有效的代码

import os
import re
import shutil

def test():
    os.chdir("C:/Users/David/Desktop/Test/MyFiles")
    files = os.listdir(".")
    os.mkdir("C:/Users/David/Desktop/Test/MyFiles2")
    for x in (files):
        inputFile = open((x), "r")
        content = inputFile.read()
        inputFile.close()
        if ("Hello World" in content)
            shutil.copy(x, "C:/Users/David/Desktop/Test/MyFiles2")

Here is my code when I have tried to use RegEx's

这是我尝试使用 RegEx 时的代码

import os
import re
import shutil

def test2():
    os.chdir("C:/Users/David/Desktop/Test/MyFiles")
    files = os.listdir(".")
    os.mkdir("C:/Users/David/Desktop/Test/MyFiles2")
    regex_txt = "facebook.com"
    for x in (files):
        inputFile = open((x), "r")
        content = inputFile.read()
        inputFile.close()
        regex = re.compile(regex_txt, re.IGNORECASE)

Im guessing that I need a line of code that is something like

我猜我需要一行类似的代码

if regex = re.compile(regex_txt, re.IGNORECASE) == True

But I cant seem to get anything to work, if someone could point me in the right direction it would be appreciated.

但我似乎无法得到任何工作,如果有人能指出我正确的方向,我将不胜感激。

采纳答案by aw4lly

if re.match(regex, content) is not None:
  blah..

You could also use re.searchdepending on how you want it to match.

您也可以re.search根据您希望它如何匹配来使用。

回答by Silas Ray

First you compile the regex, then you have to use it with match, find, or some other method to actually run it against some input.

首先,你编译正则表达式,那么你必须使用它matchfind或者一些其他的方法来实际运行对一些输入。

import os
import re
import shutil

def test():
    os.chdir("C:/Users/David/Desktop/Test/MyFiles")
    files = os.listdir(".")
    os.mkdir("C:/Users/David/Desktop/Test/MyFiles2")
    pattern = re.compile(regex_txt, re.IGNORECASE)
    for x in (files):
        with open((x), 'r') as input_file:
            for line in input_file:
                if pattern.search(line):
                    shutil.copy(x, "C:/Users/David/Desktop/Test/MyFiles2")
                    break

回答by Mike Samuel

The REPL makes it easy to learn APIs. Just run python, create an object and then ask for help:

REPL 使学习 API 变得容易。只需运行python,创建一个对象,然后请求help

$ python
>>> import re
>>> help(re.compile(r''))

at the command line shows, among other things:

在命令行显示,除其他外:

search(...)

search(string[, pos[, endpos]])--> match object or None. Scan through string looking for a match, and return a corresponding MatchObjectinstance. Return Noneif no position in the string matches.

search(...)

search(string[, pos[, endpos]])--> 匹配对象或None. 扫描字符串查找匹配项,并返回相应的 MatchObject实例。None如果字符串中没有位置匹配,则返回。

so you can do

所以你可以做

regex = re.compile(regex_txt, re.IGNORECASE)

match = regex.search(content)  # From your file reading code.
if match is not None:
  # use match

Incidentally,

顺便,

regex_txt = "facebook.com"

has a .which matches any character, so re.compile("facebook.com").search("facebookkcom") is not Noneis true because .matches any character. Maybe

有一个.匹配任何字符,所以re.compile("facebook.com").search("facebookkcom") is not None是真的,因为.匹配任何字符。也许

regex_txt = r"(?i)facebook\.com"

The \.matches a literal "."character instead of treating .as a special regular expression operator.

\.文字匹配"."字符而不是治疗.作为一种特殊的正则表达式运算符。

The r"..."bit means that the regular expression compiler gets the escape in \.instead of the python parser interpreting it.

r"..."位意味着正则表达式编译器获得转义\.而不是 python 解析器解释它。

The (?i)makes the regex case-insensitive like re.IGNORECASEbut self-contained.

(?i)使得正则表达式不区分大小写,re.IGNORECASE但自包含。

回答by Jon Clements

Regex's shouldn't really be used in this fashion - unless you want something more complicated than what you're trying to do - for instance, you could just normalise your content string and comparision string to be:

正则表达式不应该真正以这种方式使用 - 除非你想要比你想要做的更复杂的东西 - 例如,你可以将你的内容字符串和比较字符串标准化为:

if 'facebook.com' in content.lower():
    shutil.copy(x, "C:/Users/David/Desktop/Test/MyFiles2")

回答by Bob Stein

if re.search(r'pattern', string):

if re.search(r'pattern', string):

Simple if-test:

简单的 if 测试:

if re.search(r'ing\b', "seeking a great perhaps"):     # any words end with ing?
    print("yes")

Pattern check, extract a substring, case insensitive:

模式检查,提取子串,不区分大小写:

match_object = re.search(r'^OUGHT (.*) BE$', "ought to be", flags=re.IGNORECASE)
if match_object:
    assert "to" == match_object.group(1)     # what's between ought and be?

Notes:

笔记:

  • Use re.search()not re.match. Match restricts to the startof strings, a confusingconvention if you ask me. If you do want a string-starting match, use caret or \Ainstead, re.search(r'^...', ...)

  • Use raw stringsyntax r'pattern'for the first parameter. Otherwise you would need to double up backslashes, as in re.search('ing\\b', ...)

  • In this example, \bis a special sequencemeaning word-boundaryin regex. Not to be confused with backspace.

  • re.search()returns Noneif it doesn't find anything, which is always falsy.

  • re.search()returns a Match objectif it finds anything, which is always truthy.

  • a group is what matched inside parentheses

  • group numbering starts at 1

  • Specs

  • Tutorial

  • 使用re.search()不重新匹配。匹配仅限于字符串的开头,如果您问我,这是一个令人困惑的约定。如果您确实想要字符串开头的匹配,请使用插入符号或\A代替,re.search(r'^...', ...)

  • 对第一个参数使用原始字符串语法r'pattern'。否则你需要加倍反斜杠,如re.search('ing\\b', ...)

  • 在这个例子中,\b是一个特殊的序列,意思是正则表达式中的词边界。不要与退格键混淆。

  • re.search()None如果它没有找到任何东西,则返回,这总是falsy

  • re.search()如果找到任何内容,则返回一个Match 对象,该对象始终为真。

  • 组是括号内匹配的内容

  • 组编号从 1 开始

  • 眼镜

  • 教程