Python 正则表达式匹配任何字符或无?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38982637/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 21:43:32  来源:igfitidea点击:

regex to match any character or none?

pythonregex

提问by user1165419

I have the two following peices of strings;

我有以下两根弦;

line1 = [16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore

line2 = [16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore

I'm trying to grab these two parts;

我试图抓住这两部分;

"GET /file/ HTTP/1.1" 302
"" 400

Basically any character in between the two "" or nothing in between "". So far I've tried this;

基本上两个“”之间的任何字符或“”之间的任何字符。到目前为止,我已经尝试过了;

regex_example = re.search("\".+?\" [0-9]{3}", line1)
print regex_example.group()

This will work with line1, but give an error for line2. This is due to the '.' matching any character, but giving an error if no character exists.

这将适用于第 1 行,但会导致第 2 行错误。这是由于'.' 匹配任何字符,但如果不存在字符则报错。

Is there any way for it to match any character or nothing in between the two ""?

有什么方法可以匹配两个“”之间的任何字符或任何字符?

回答by 4castle

Use .*?instead of .+?.

使用.*?代替.+?

+means "1 or more"

+表示“1 个或多个”

*means "0 or more"

*表示“0 或更多”

Regex101 Demo

Regex101 演示

If you want a more efficient regex, use a negated character class [^"]instead of a lazy quantifier ?. You should also use the raw string flag rand \dfor digits.

如果您想要更高效的正则表达式,请使用否定字符类[^"]而不是惰性量词?。您还应该使用原始字符串标志r\d数字。

r'"[^"]*" \d{3}'

回答by Jan

You can use:

您可以使用:

import re

lines = ['[16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore', '[16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore']

rx = re.compile(r'''
        "[^"]*" # ", followed by anything not a " and a "
        \       # a space
        \d+     # at least one digit
        ''', re.VERBOSE)

matches = [m.group(0) \
            for line in lines \
            for m in rx.finditer(line)]

print(matches)
# ['"GET /file/ HTTP/1.1" 302', '"" 400']



a demo on ideone.comideone.com 上的演示

回答by emdi

More simpler answer.

更简单的答案。

    import re
    line1= '[16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore'
    line2='[16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore'

    x=re.search('\](.+)random',line1).group(1)

    y= re.search('\](.+)random', line2).group(1)

    print(x + "\n"+y)

You will get the following output

您将获得以下输出

     "GET /file/ HTTP/1.1" 302 
     "" 400

回答by Samuel Nde

Another option is:

另一种选择是:

import re
re.sub('\[.*\] ', '', your_string)

This should replace any combination of characters in square brackets []followed by a white space with an empty string ""in your_stringand return the results.

这应该替换方括号中的任何字符组合[],然后用一个空字符串白色空间""your_string并返回结果。

for example

例如

for your_string in [line1, line2]:
    print(re.sub('\[.*\] ', '', your_string))

outputs

产出

>>>"GET /file/ HTTP/1.1" 302 random stuff ignore'
>>>"" 400 random stuff ignore'

回答by WeShall

Try this... Using 'findall' in place of 'search' might give you a better control over how you want to process your output.

试试这个... 使用“findall”代替“search”可能会让您更好地控制处理输出的方式。

import re

output = []

logs = '[16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore \
        [16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore'

regex = r'"(.*?)"\s(\d{3})'

value = re.findall(regex, logs)
output.append(value)

print(output)