python多行正则表达式

Question

提问by AKASH

How do I extract all the characters (including newline characters) until the first occurrence of the giver sequence of words? For example with the following input:

如何提取所有字符（包括换行符），直到第一次出现给定词序列？例如使用以下输入：

input text:

输入文本：

"shantaram is an amazing novel.
It is one of the best novels i have read.
the novel is written by gregory david roberts.
He is an australian"

And the sequence theI want to extract text from shantaramto first occurrence of thewhich is in the second line.

the我想从中提取文本shantaram到第一次出现的序列the在第二行。

The output must be-

输出必须是-

shantaram is an amazing novel.
It is one of the

I have been trying all morning. I can write the expression to extract all characters until it encounters a specific character but here if I use an expression like:

我整个上午都在努力。我可以编写表达式来提取所有字符，直到遇到特定字符，但如果我使用如下表达式：

re.search("shantaram[\s\S]*the", string)

It doesn't match across newline.

它与换行符不匹配。

Answer 1

回答by Chris Seymour

You want to use the DOTALLoption to match across newlines. From doc.python.org:

您想使用该DOTALL选项来匹配换行符。来自doc.python.org：

re.DOTALL
Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

重新打点
制作'.' 特殊字符完全匹配任何字符，包括换行符；没有这个标志，'.' 将匹配除换行符以外的任何内容。

Demo:

演示：

In [1]: import re

In [2]: s="""shantaram is an amazing novel.
It is one of the best novels i have read.
the novel is written by gregory david roberts.
He is an australian"""

In [3]: print re.findall('^.*?the',s,re.DOTALL)[0]
shantaram is an amazing novel.
It is one of the

Answer 2

回答by rlms

A solution not using regex:

不使用正则表达式的解决方案：

from itertools import takewhile
def upto(a_string, stop):
    return " ".join(takewhile(lambda x: x != stop and x != "\n".format(stop), a_string))

Answer 3

回答by lancif

Use this regex,

使用这个正则表达式，

re.search("shantaram[\s\S]*?the", string)

instead of

代替

re.search("shantaram[\s\S]*the", string)

The only difference is '?'. By using '?'(e.g. *?, +?), you can prevent longest matching.

唯一的区别是“？”。通过使用'?'（例如*?, +?），您可以防止最长匹配。

python多行正则表达式

提问by AKASH

回答by Chris Seymour

回答by rlms

回答by lancif

相关推荐

最近更新

标签

python多行正则表达式

提问by AKASH

回答by Chris Seymour

回答by rlms

回答by lancif

相关推荐

Python html 中的硒和 iframe

Python Numpy Array 获取按行搜索的行索引

Python 中的“可迭代”究竟是什么意思？为什么我的实现 `__getitem__()` 的对象不是可迭代的？

类型错误：“模块”对象不可用于 python 对象

相关推荐

最近更新

标签

Python 中的“可迭代”究竟是什么意思？为什么我的实现 `getitem()` 的对象不是可迭代的？