Python 正则表达式，多行匹配模式.. 为什么这不起作用？

Question

提问by Rick

I know that for parsing I should ideally remove all spaces and linebreaks but I was just doing this as a quick fix for something I was trying and I can't figure out why its not working.. I have wrapped different areas of text in my document with the wrappers like "####1" and am trying to parse based on this but its just not working no matter what I try, I think I am using multiline correctly.. any advice is appreciated

我知道对于解析，我应该理想地删除所有空格和换行符，但我这样做只是为了快速解决我正在尝试的事情，我不知道为什么它不起作用..我在我的文本中包裹了不同的区域带有“####1”之类的包装器的文档，并试图基于此进行解析，但无论我尝试什么，它都不起作用，我想我正确使用了多行.. 任何建议表示赞赏

This returns no results at all:

这根本不返回任何结果：

string='
####1
ttteest
####1
ttttteeeestt

####2   

ttest
####2'

import re
pattern = '.*?####(.*?)####'
returnmatch = re.compile(pattern, re.MULTILINE).findall(string)
return returnmatch

Answer 1

采纳答案by leoluk

Try re.findall(r"####(.*?)\s(.*?)\s####", string, re.DOTALL)(works with re.compiletoo, of course).

尝试re.findall(r"####(.*?)\s(.*?)\s####", string, re.DOTALL)（re.compile当然也可以使用）。

This regexp will return tuples containing the number of the section and the section content.

此正则表达式将返回包含节编号和节内容的元组。

For your example, this will return [('1', 'ttteest'), ('2', ' \n\nttest')].

对于您的示例，这将返回[('1', 'ttteest'), ('2', ' \n\nttest')].

(BTW: your example won't run, for multiline strings, use '''or """)

（顺便说一句：您的示例将无法运行，对于多行字符串，请使用'''或"""）

Answer 2

回答by Colin Hebert

Multiline doesn't mean .will match line return, it means that ^and $are limited to lines only

多行并不意味着.将匹配行返回，这意味着^并且$仅限于行

re.M re.MULTILINE
When specified, the pattern character '^' matches at the beginning of the string and at the >beginning of each line (immediately following each newline); and the pattern character '$' >matches at the end of the string and at the end of each line (immediately preceding each >newline). By default, '^' matches only at the beginning of the string, and '$' only at the >end of the string and immediately before the newline (if any) at the end of the string.

re.M re.MULTILINE
指定时，模式字符 '^' 匹配字符串的开头和每行的 > 开头（紧跟在每个换行符之后）；并且模式字符 '$' > 匹配字符串的末尾和每行的末尾（紧接在每个 > 换行符之前）。默认情况下，'^' 仅匹配字符串的开头，而 '$' 仅匹配字符串的 > 结尾和字符串结尾的换行符（如果有）之前。

re.Sor re.DOTALLmakes .match even new lines.

re.S或re.DOTALL使.匹配甚至新行。

Source

来源

http://docs.python.org/

Python 正则表达式，多行匹配模式.. 为什么这不起作用？

提问by Rick

采纳答案by leoluk

回答by Colin Hebert

相关推荐

最近更新

标签

Python 正则表达式，多行匹配模式.. 为什么这不起作用？

提问by Rick

采纳答案by leoluk

回答by Colin Hebert

相关推荐

Python 拆分（分解）pandas 数据框字符串条目以分隔行

Python 按第二个值对元组列表进行排序，reverse=True，然后按键，reverse=False

使用 Python 和 imaplib 在 GMail 中移动电子邮件

Python 在 jinja2 中是否有直接的方法来格式化数字？

相关推荐

最近更新

标签