python多行正则表达式
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18943223/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
python multiline regular expressions
提问by AKASH
How do I extract all the characters (including newline characters) until the first occurrence of the giver sequence of words? For example with the following input:
如何提取所有字符(包括换行符),直到第一次出现给定词序列?例如使用以下输入:
input text:
输入文本:
"shantaram is an amazing novel.
It is one of the best novels i have read.
the novel is written by gregory david roberts.
He is an australian"
And the sequence theI want to extract text from shantaramto first occurrence of thewhich is in the second line.
the我想从中提取文本shantaram到第一次出现的序列the在第二行。
The output must be-
输出必须是-
shantaram is an amazing novel.
It is one of the
I have been trying all morning. I can write the expression to extract all characters until it encounters a specific character but here if I use an expression like:
我整个上午都在努力。我可以编写表达式来提取所有字符,直到遇到特定字符,但如果我使用如下表达式:
re.search("shantaram[\s\S]*the", string)
It doesn't match across newline.
它与换行符不匹配。
回答by Chris Seymour
You want to use the DOTALLoption to match across newlines. From doc.python.org:
您想使用该DOTALL选项来匹配换行符。来自doc.python.org:
re.DOTALL
Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.
重新打点
制作'.' 特殊字符完全匹配任何字符,包括换行符;没有这个标志,'.' 将匹配除换行符以外的任何内容。
Demo:
演示:
In [1]: import re
In [2]: s="""shantaram is an amazing novel.
It is one of the best novels i have read.
the novel is written by gregory david roberts.
He is an australian"""
In [3]: print re.findall('^.*?the',s,re.DOTALL)[0]
shantaram is an amazing novel.
It is one of the
回答by rlms
A solution not using regex:
不使用正则表达式的解决方案:
from itertools import takewhile
def upto(a_string, stop):
return " ".join(takewhile(lambda x: x != stop and x != "\n".format(stop), a_string))
回答by lancif
Use this regex,
使用这个正则表达式,
re.search("shantaram[\s\S]*?the", string)
instead of
代替
re.search("shantaram[\s\S]*the", string)
The only difference is '?'. By using '?'(e.g. *?, +?), you can prevent longest matching.
唯一的区别是“?”。通过使用'?'(例如*?, +?),您可以防止最长匹配。

