Python 正则表达式匹配任何字符或无?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/38982637/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
regex to match any character or none?
提问by user1165419
I have the two following peices of strings;
我有以下两根弦;
line1 = [16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore
line2 = [16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore
I'm trying to grab these two parts;
我试图抓住这两部分;
"GET /file/ HTTP/1.1" 302
"" 400
Basically any character in between the two "" or nothing in between "". So far I've tried this;
基本上两个“”之间的任何字符或“”之间的任何字符。到目前为止,我已经尝试过了;
regex_example = re.search("\".+?\" [0-9]{3}", line1)
print regex_example.group()
This will work with line1, but give an error for line2. This is due to the '.' matching any character, but giving an error if no character exists.
这将适用于第 1 行,但会导致第 2 行错误。这是由于'.' 匹配任何字符,但如果不存在字符则报错。
Is there any way for it to match any character or nothing in between the two ""?
有什么方法可以匹配两个“”之间的任何字符或任何字符?
回答by 4castle
Use .*?
instead of .+?
.
使用.*?
代替.+?
。
+
means "1 or more"
+
表示“1 个或多个”
*
means "0 or more"
*
表示“0 或更多”
If you want a more efficient regex, use a negated character class [^"]
instead of a lazy quantifier ?
. You should also use the raw string flag r
and \d
for digits.
如果您想要更高效的正则表达式,请使用否定字符类[^"]
而不是惰性量词?
。您还应该使用原始字符串标志r
和\d
数字。
r'"[^"]*" \d{3}'
回答by Jan
You can use:
您可以使用:
import re
lines = ['[16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore', '[16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore']
rx = re.compile(r'''
"[^"]*" # ", followed by anything not a " and a "
\ # a space
\d+ # at least one digit
''', re.VERBOSE)
matches = [m.group(0) \
for line in lines \
for m in rx.finditer(line)]
print(matches)
# ['"GET /file/ HTTP/1.1" 302', '"" 400']
看 a demo on ideone.comideone.com 上的演示。
回答by emdi
More simpler answer.
更简单的答案。
import re
line1= '[16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore'
line2='[16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore'
x=re.search('\](.+)random',line1).group(1)
y= re.search('\](.+)random', line2).group(1)
print(x + "\n"+y)
You will get the following output
您将获得以下输出
"GET /file/ HTTP/1.1" 302
"" 400
回答by Samuel Nde
Another option is:
另一种选择是:
import re
re.sub('\[.*\] ', '', your_string)
This should replace any combination of characters in square brackets []
followed by a white space with an empty string ""
in your_string
and return the results.
这应该替换方括号中的任何字符组合[]
,然后用一个空字符串白色空间""
中your_string
并返回结果。
for example
例如
for your_string in [line1, line2]:
print(re.sub('\[.*\] ', '', your_string))
outputs
产出
>>>"GET /file/ HTTP/1.1" 302 random stuff ignore'
>>>"" 400 random stuff ignore'
回答by WeShall
Try this... Using 'findall' in place of 'search' might give you a better control over how you want to process your output.
试试这个... 使用“findall”代替“search”可能会让您更好地控制处理输出的方式。
import re
output = []
logs = '[16/Aug/2016:06:13:25 -0400] "GET /file/ HTTP/1.1" 302 random stuff ignore \
[16/Aug/2016:06:13:25 -0400] "" 400 random stuff ignore'
regex = r'"(.*?)"\s(\d{3})'
value = re.findall(regex, logs)
output.append(value)
print(output)