Python 正则表达式匹配任何字符或无？

Question

提问by user1165419

I have the two following peices of strings;

我有以下两根弦；

line1 = [16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore

line2 = [16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore

I'm trying to grab these two parts;

我试图抓住这两部分；

"GET /file/ HTTP/1.1" 302
"" 400

Basically any character in between the two "" or nothing in between "". So far I've tried this;

基本上两个“”之间的任何字符或“”之间的任何字符。到目前为止，我已经尝试过了；

regex_example = re.search("\".+?\" [0-9]{3}", line1)
print regex_example.group()

This will work with line1, but give an error for line2. This is due to the '.' matching any character, but giving an error if no character exists.

这将适用于第 1 行，但会导致第 2 行错误。这是由于'.' 匹配任何字符，但如果不存在字符则报错。

Is there any way for it to match any character or nothing in between the two ""?

有什么方法可以匹配两个“”之间的任何字符或任何字符？

Answer 1

回答by 4castle

Use .*?instead of .+?.

使用.*?代替.+?。

+means "1 or more"

+表示“1 个或多个”

*means "0 or more"

*表示“0 或更多”

Regex101 Demo

Regex101 演示

If you want a more efficient regex, use a negated character class [^"]instead of a lazy quantifier ?. You should also use the raw string flag rand \dfor digits.

如果您想要更高效的正则表达式，请使用否定字符类[^"]而不是惰性量词?。您还应该使用原始字符串标志r和\d数字。

r'"[^"]*" \d{3}'

Answer 2

回答by Jan

You can use:

您可以使用：

import re

lines = ['[16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore', '[16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore']

rx = re.compile(r'''
        "[^"]*" # ", followed by anything not a " and a "
        \       # a space
        \d+     # at least one digit
        ''', re.VERBOSE)

matches = [m.group(0) \
            for line in lines \
            for m in rx.finditer(line)]

print(matches)
# ['"GET /file/ HTTP/1.1" 302', '"" 400']

看 a demo on ideone.comideone.com 上的演示。

Answer 3

回答by emdi

回答by Samuel Nde

Another option is:

另一种选择是：

import re
re.sub('\[.*\] ', '', your_string)

This should replace any combination of characters in square brackets []followed by a white space with an empty string ""in your_stringand return the results.

这应该替换方括号中的任何字符组合[]，然后用一个空字符串白色空间""中your_string并返回结果。

for example

例如

for your_string in [line1, line2]:
    print(re.sub('\[.*\] ', '', your_string))

outputs

产出

>>>"GET /file/ HTTP/1.1" 302 random stuff ignore'
>>>"" 400 random stuff ignore'

Answer 5

回答by WeShall

Try this... Using 'findall' in place of 'search' might give you a better control over how you want to process your output.

试试这个... 使用“findall”代替“search”可能会让您更好地控制处理输出的方式。

import re

output = []

logs = '[16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore \
        [16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore'

regex = r'"(.*?)"\s(\d{3})'

value = re.findall(regex, logs)
output.append(value)

print(output)

Python 正则表达式匹配任何字符或无？

提问by user1165419

回答by 4castle

回答by Jan

回答by emdi

回答by Samuel Nde

回答by WeShall

相关推荐

最近更新

标签

Python 正则表达式匹配任何字符或无？

提问by user1165419

回答by 4castle

回答by Jan

回答by emdi

回答by Samuel Nde

回答by WeShall

相关推荐

Python 如何舍入/删除熊猫列中的“.0”零？

Python Django：从 django.urls 导入反向；导入错误：没有名为 url 的模块

Python 增加seaborn中的刻度标签字体大小

Python TensorFlow 将图形保存到文件中/从文件加载图形

相关推荐

最近更新

标签